ElasticSearch 是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。本文介绍 ElasticSearch 安装使用。
安装部署
略。
Lucene 介绍
lucene,就是一个jar包,里面包含了封装好的各种建立倒排索引,以及进行搜索的代码,包括各种算法。我们就用java开发的时候,引入lucene jar,然后基于lucene的api进行去进行开发就可以了。用lucene,我们就可以去将已有的数据建立索引,lucene会在本地磁盘上面,给我们组织索引的数据结构。另外的话,我们也可以用lucene提供的一些功能和api来针对磁盘上额
elasticsearch 底层封装 lucene:
- 仅实时:秒级别
- 离线批处理:batch-processing
使用示例:
- 查询 age 字段等于 18 的数据:
age:18
- 查询多个词语请使用双引号:
name:"xiao xie"
- 大于 18 且小于 22:
age:{18 TO 22}
- 大于等于 18 且小于等于 22:
age:[18 TO 22]
- 大于:
age:>10
(小于同理)
- 大于等于:
age:>=10
(小于等于同理)
- 不等于:
NOT age:10
或 -age:10
- 匹配单个字符
(?)
:name:xiao?
- 匹配 0 个或多个字符
(*)
:name:xiao*
- 关系预算:
and
、or
- 多条件查询(逻辑运算符请使用大写):
age:[18 TO 499] AND (extension:php OR extension:html)
- 查询以下特殊字符需要使用反斜杠转义:
+
、-
、&&
、||
、!
、(
、)
、{
、}
、[
、]
、^
、"
、~
、*
、?
、:
、\
常用命令
version
GET /
{
"name": "host-1",
"cluster_name": "elasticsearch",
"cluster_uuid": "sdfsfsdfsdfsd",
"version": {
"number": "5.1.2",
"build_hash": "c8c4c16",
"build_date": "2017-01-11T20:18:39.146Z",
"build_snapshot": false,
"lucene_version": "6.3.0"
},
"tagline": "You Know, for Search"
}
状态查询
GET _cat/health?v
GET _cat/nodes?v
GET _cluster/health
GET _cluster/health?level=indices
GET _cluster/health?level=shards
GET /_cat/allocation?v
search
GET _search
{
"query": {
"match_all": {}
}
}
查询 elasticsearch-data-transform/ETL/clean/searchTemplate/ceilometer/cpu_util
:
GET ceilometer_original-2018.01/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"counter_name.keyword": {
"value": "cpu_util"
}
}
},{
"range": {
"timestamp": {
"gte": "now-3m",
"lte": "now"
}
}
}
]
}
}
}
}
, "size": 10000,
"_source": ["counter_name","counter_volume","resource_metadata.host","project_id","resource_metadata.instance_host","resource_metadata.vcpus","resource_metadata.memory_mb","resource_metadata.display_name","resource_metadata.name","resource_metadata.instance_id","timestamp","@timestamp"]
}
执行查询
GET rollover_index_test/_search
indices
查询:
GET _cat/indices?v
删除:
DELETE /test_xxb.2018.01
POST /_all/_forcemerge?only_expunge_deletes=true
为了安全起见,可以在配置文件中设置禁用_all和*通配符
action.destructive_requires_name = true
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_deleting_an_index.html#_deleting_an_index
设置副本数:
PUT xxb_test.2018.01/_settings
{
"index":{
"number_of_replicas": 1
}
}
aliases
GET _cat/aliases?v
GET _cat/aliases/*unit_mb_memory_alias*
https://www.elastic.co/guide/en/elasticsearch/reference/5.2/indices-aliases.html
templates
GET _cat/templates
http://cwiki.apachecn.org/pages/viewpage.action?pageId=9406922
GET _template/sensu_storage_index_disk_iostats-metrics_template
shard
GET _cat/shards/applog-prod-2016.12.18*
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
GET /_cluster/allocation/explain?pretty
command=/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf –path.data /data/01/logstash/sensu2es –http.port 9606 -b 1000 -w 4
GET _nodes/stats/process?filter_path=**.max_file_descriptors
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_cluster_health.html
https://www.elastic.co/guide/cn/elasticsearch/guide/current/index-templates.html#index-templates
settings
GET /_cluster/settings
磁盘限额,为了保护节点数据安全。Elasticsearch会定时(cluster.info.update.interval默认为30秒)检查一下各节点的数据目录磁盘使用情况。在达到cluster.routing.allocation.disk.watermark.low(默认85%)的时候,新索引分片就不会再分配到这个节点上了。在达到cluster.routing.allocation.disk.watermark.high(默认90%)的时候就会触发该节点现存分片的数据平衡,把数据挪到其他节点上去。这两个值可以写成百分比或者具体字节数。可以适当修改参数配置:
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "50gb"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html
数据迁移
下载 https://github.com/medcl/esm
chmod +x esm
# 导出数据
./esm -s http://10.3.10.61:9200 -x "xxb_test_2018.1" -c 5000 -b 5 --refresh -o=xxb_test_2018.1.bin
# 导入数据
./esm -d http://10.3.10.61:9200 -y "xxb_test_2018.1" -c 5000 -b 5 --refresh -i=xxb_test_2018.1.bin
API
常见问题
Unassigned Shards 问题
解决elasticsearch集群Unassigned Shards 无法reroute的问题
https://www.jianshu.com/p/542ed5a5bdfc
https://www.jianshu.com/p/329b9f92ac4c
解决方法:
curl -XGET http://127.0.0.1:9200/_cat/shards | fgrep UNASSIGNED
删除对应的 index:
DELETE /index-00000*
POST /_all/_forcemerge?only_expunge_deletes=true
数据恢复
POST /_reindex
{
"source": {
"index": "xxb_test.2018.01"
},
"dest": {
"index": "xxb_test.2018.01.bak",
"version_type": "external"
}
}
查看 indices 和 shards 是否正常:
GET _cat/indices/xxb_test.2018.01.bak
GET _cat/shards/xxb_test.2018.01.bak
删除异常 index:
DELETE /xxb_test.2018.01
es 是标记为删除,需要执行以下命令清除磁盘空间:
POST /_all/_forcemerge?only_expunge_deletes=true
设置别名:
PUT xxb_test.2018.01.bak/_alias/xxb_test.2018.01
内存错误
java.lang.OutOfMemoryError: Java heap space
调整jvm.options
中如下参数:
-Xms16g
-Xmx16g
elasticsearch
的 java heap : os memory 在 35%-45% 之间比较好,但是 java heap
不能超过 32G,超过32G,elasticsearch 不会使用指针压缩算法。
操作系统&JVM配置官方说明:
集群异常state not recovered问题
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]
http://127.0.0.1:9200/_cluster/health?wait_for_status=yellow&timeout=50s
http://127.0.0.1:9200/_cat/recovery?v&h=i,s,t,ty,st,rep,snap,f,fp,b,bp
hprof问题
elasticsearch/bin
目录下产生大量java_pid*.hprof
文件,该问题一般有内存溢出
导致,上面的集群异常state not recovered问题
也会导致该问题,一般重启es
进程可以解决。