index
index,译作索引。
我们依次讨论索引的创建、查看和删除。
创建
创建方法
创建index的方法为PUT /{indexName}
。
示例代码:
运行结果:
1 2 3 4 5 6 #! Deprecation: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template { "acknowledged" : true, "shards_acknowledged" : true, "index" : "kaka" }
解释说明:
"acknowledged" : true
:索引创建成功
"shards_acknowledged" : true
:分片创建成功
"index" : "kaka"
:索引名称
指定参数
我们再来解释一下#!
开头的第一行。
在7版本之前,每创建一个索引,默认都会有5个分片。但是从7版本开始,默认的分片数是1。如果需要5个分片,需要在创建索引的时候加入明确的规则。
(我们用的是6版本,无需考虑这一点。)
加入明确规则的例子如下:
1 2 3 4 5 6 7 PUT /kk { "settings" : { "number_of_replicas" : 1 , "number_of_shards" : 5 } }
Kibana有提示和补全功能。但如果第一个{
没有换行,而是和/kk
同一行的话,会导致提示补全失效。
所有字母必须小写
正如我们上一章《1.工具、概念和集群》 所述,所有字母必须小写。
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { "error": { "root_cause": [ { "type": "invalid_index_name_exception", "reason": "Invalid index name [KKK], must be lowercase", "index_uuid": "_na_", "index": "KKK" } ], "type": "invalid_index_name_exception", "reason": "Invalid index name [KKK], must be lowercase", "index_uuid": "_na_", "index": "KKK" }, "status": 400 }
解释说明:因为存在大写字母,所以报错了。
1 2 "type": "invalid_index_name_exception", "reason": "Invalid index name [KKK], must be lowercase",
查看
查看方法
查看索引的方法为GET /_cat/indices
示例代码:
运行结果:
1 2 3 4 green open .kibana_1 azRP1WYcSlSKg_Iu8kim_Q 1 0 4 1 16.8kb 16.8kb yellow open kk a3akwYNuTG2Bq0maLLE0-A 5 1 0 0 1.1kb 1.1kb green open .kibana_task_manager HsnZtouxT2i40qSFeeO9ug 1 0 2 0 12.6kb 12.6kb yellow open kaka bOal4r2bTzCKfoR57mvWqg 5 1 0 0 1.1kb 1.1kb
解释说明:
在6.8.0及以上版本中,Kibana会创建两个索引.kibana_1
和.kibana_task_manager
。
显示表头
那么,上述运行结果的green
和yellow
,又代表什么呢?
我们在命令的结尾加上?v
,显示表头。
示例代码:
运行结果:
1 2 3 4 5 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .kibana_1 azRP1WYcSlSKg_Iu8kim_Q 1 0 4 1 16.8kb 16.8kb yellow open kk a3akwYNuTG2Bq0maLLE0-A 5 1 0 0 1.2kb 1.2kb green open .kibana_task_manager HsnZtouxT2i40qSFeeO9ug 1 0 2 0 12.6kb 12.6kb yellow open kaka bOal4r2bTzCKfoR57mvWqg 5 1 0 0 1.2kb 1.2kb
解释说明:
health
:健康度
yellow
:不健壮的(可用),因为索引被分成了5个分片,但是这5个分片又被放在了一个节点上。
green
:健壮的(可用)
red
:不可用的
status
:状态
index
:索引
uuid
:唯一标识
pri
:分片数
rep
:副本数
docs.count
:文档数量
docs.deleted
:被删除文档数
store.size
:存储大小
pri.store.size
:主分片存储大小
删除
删除方法
删除索引的方法为DELETE /{indexName}
示例代码:
运行结果:
1 2 3 { "acknowledged" : true }
特别提示:如果删除了Kibana的索引,会导致Kibana不可用。如果已经删除了,重启Kibana,会重新自建索引,然后可以恢复。
删除所有
删除所有索引的方法为DELETE /_all
。
示例代码:
运行结果:
1 2 3 { "acknowledged" : true }
有些资料会说,删除所有索引可以用DELETE /*
,但这是在6.8.23的版本中,已经不可以这么做了。
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Wildcard expressions or all indices are not allowed" } ], "type": "illegal_argument_exception", "reason": "Wildcard expressions or all indices are not allowed" }, "status": 400 }
mapping
如果把索引比拟为表的话,mapping就是表结构。
创建
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 PUT /est { "mappings" : { "_doc" : { "properties" : { "id" : { "type" : "keyword" }, "name" : { "type" : "text" }, "age" : { "type" : "integer" }, "bir" : { "type" : "date" } } } } }
运行结果:
1 2 3 4 5 6 7 #! Deprecation: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template #! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in create index requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', and requests are expected to omit the type name in mapping definitions. { "acknowledged" : true, "shards_acknowledged" : true, "index" : "est" }
解释说明:
est
:索引名
mappings
:关键词。在6版本之前,一个index有多个type,所以是复数。虽然6版本之后,一个index一个type,但是复数形式被保留了。
_doc
:类型名,这里遵循了ElasticSearch官方的建议,名字为_doc
。
properties
:关键词,说明接下来的是字段
id
、name
、age
、bir
:field,字段。
type
:关键词,说明接下来的是数据类型
keyword
、text
、integer
、date
:数据类型
数据类型
在ElasticSearch中,数据类型八种:
text
keyword
date
integer
long
double
boolean
ip
其中text
会被分词,keyword
不会被分词。
查看
查看方法
查看方法为GET /{indexName}
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 #! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get indices requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions. { "est" : { "aliases" : { }, "mappings" : { "_doc" : { "properties" : { "age" : { "type" : "integer" }, "bir" : { "type" : "date" }, "id" : { "type" : "keyword" }, "name" : { "type" : "text" } } } }, "settings" : { "index" : { "creation_date" : "1643078198880", "number_of_shards" : "5", "number_of_replicas" : "1", "uuid" : "eZxussN2RUaHxkqHOrObPg", "version" : { "created" : "6082399" }, "provided_name" : "est" } } } }
只看mapping
如上,返回了所有的内容。如果只想看mapping,方法为GET /{indexName}/_mapping
。
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get mapping requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions. { "est" : { "mappings" : { "_doc" : { "properties" : { "age" : { "type" : "integer" }, "bir" : { "type" : "date" }, "id" : { "type" : "keyword" }, "name" : { "type" : "text" } } } } } }
document
document,文档,可以理解为一行记录。
新增
新增方法
新增document的方法为POST /{indexName}/{typeName}【JSON格式的内容】
。
示例代码:
1 2 3 4 5 6 7 POST /est/_doc { "id" : "一号" , "name" :"赵小六" , "age" :23 , "bir" :"2012-12-12" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "est", "_type" : "_doc", "_id" : "5Y0ij34BoH8Nsaao-4rh", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
POST和PUT
新增文档用POST
。
有部分资料说新增文档用PUT
,但是根据我的实际测试,在6.8.23中,已经不支持PUT
了,只能有POST
。
(但如果指定ID的话,又可以用PUT
。)
示例代码:
1 2 3 4 5 6 7 PUT /est/_doc { "id" : "一号" , "name" :"赵小六" , "age" :23 , "bir" :"2012-12-12" }
运行结果:
1 2 3 4 { "error": "Incorrect HTTP method for uri [/est/_doc?pretty] and method [PUT], allowed: [POST]", "status": 405 }
在RestFul中,通常情况下,POST
、GET
、PUT
、DELETE
分别对应CRUD,但有时候POST和PUT会混用。
指定id
如果我们想指定id怎么办?格式如下:
1 POST /{indexName}/{typeName}/{id}
示例代码:
1 2 3 4 5 6 7 POST /est/_doc/1 { "id" : "一号" , "name" :"赵小六" , "age" :23 , "bir" :"2012-12-12" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
查询
查询方法为GET /{indexName}/{typeName}/{文档ID}
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "id" : "一号", "name" : "赵小六", "age" : 23, "bir" : "2012-12-12" } }
删除
删除document的方法为DELETE /{indexName}/{typeName}/{文档ID}
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1 }
更新
不保留原始数据
更新?来吧!
示例代码:
1 2 3 4 5 6 7 POST /est/_doc/1 { "id" : "一号" , "name" :"阿门" , "age" :23 , "bir" :"2012-12-12" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 7, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 7, "_primary_term" : 1 }
来看看结果。
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 9, "_seq_no" : 9, "_primary_term" : 1, "found" : true, "_source" : { "id" : "一号", "name" : "阿门", "age" : 23, "bir" : "2012-12-12" } }
更新成功!
再来一个!
示例代码:
1 2 3 4 POST /est/emp/1 { "id" : "天字第一号" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 10, "_seq_no" : 10, "_primary_term" : 1, "found" : true, "_source" : { "id" : "天字第一号" } }
解释说明:
POST /{indexName}/{typeName}/{id}
,这种方式类似于先删除,再增加。
保留原始数据更新
保留原始数据更新,需要添加关键词_update
。
示例代码:
1 2 3 4 5 6 POST /est/_doc/1 /_update { "doc" : { "name" : "娃哈哈" } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 12, "_seq_no" : 12, "_primary_term" : 1, "found" : true, "_source" : { "id" : "一号", "name" : "娃哈哈", "age" : 23, "bir" : "2012-12-12" } }
更新时加字段
我们还可以在更新的时候,添加一个不存在字段。
示例代码:
1 2 3 4 5 6 7 8 POST /est/_doc/1 /_update { "doc" : { "name" : "张三疯" , "age" : 11 , "dpet" : "武当派" } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 13, "_seq_no" : 13, "_primary_term" : 1, "found" : true, "_source" : { "id" : "一号", "name" : "张三疯", "age" : 11, "bir" : "2012-12-12", "dpet" : "武当派" } }
居然成功了。
再来看看新的mapping。
示例代码:
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 #! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get mapping requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions. { "est" : { "mappings" : { "_doc" : { "properties" : { "age" : { "type" : "integer" }, "bir" : { "type" : "date" }, "dpet" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "id" : { "type" : "keyword" }, "name" : { "type" : "text" } } } } } }
脚本更新
最后一个操作,脚本更新。
示例代码:
1 2 3 4 POST /est/_doc/1 /_update { "script" : "ctx._source.age += 5" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 14, "_seq_no" : 14, "_primary_term" : 1, "found" : true, "_source" : { "id" : "一号", "name" : "张三疯", "age" : 16, "bir" : "2012-12-12", "dpet" : "武当派" } }
批量
批量基于关键字_bulk
。
示例代码:
1 2 3 4 5 6 7 8 POST /est/_doc/_bulk {"index" :{"_id" :3 }} {"name" :"张三三" ,"age" :11 ,"dpet" :"武当派" } {"delete" :{"_id" :2 }} {"update" :{"_id" :1 }} {"doc" :{"name" :"张三疯" ,"age" :11 ,"dpet" :"武当派" }} {"update" :{"_id" :2 }} {"doc" :{"name" :"张三疯" ,"age" :11 ,"dpet" :"武当派" }}
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 { "took" : 9, "errors" : true, "items" : [ { "index" : { "_index" : "est", "_type" : "_doc", "_id" : "3", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, { "delete" : { "_index" : "est", "_type" : "_doc", "_id" : "2", "_version" : 1, "result" : "not_found", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 404 } }, { "update" : { "_index" : "est", "_type" : "_doc", "_id" : "1", "_version" : 15, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 15, "_primary_term" : 1, "status" : 200 } }, { "update" : { "_index" : "est", "_type" : "_doc", "_id" : "2", "status" : 404, "error" : { "type" : "document_missing_exception", "reason" : "[_doc][2]: document missing", "index_uuid" : "eZxussN2RUaHxkqHOrObPg", "shard" : "2", "index" : "est" } } } ] }
解释说明:
在示例代码中:
第一行表示要操作的doc,以及操作类型
index
的含义为新增
delete
的含义为删除
update
的含义为修改
在运行结果中:
运行结果会依次返回每一项操作的结果。
不会因为一个失败而全部失败。
(没有事务,本来就是做搜索数据库,搜索。)
Query
Query,译作高级搜索、高级查询、高级检索。
假设存在mapping和数据如下:
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 PUT /ems { "mappings" : { "_doc" : { "properties" : { "name" : { "type" : "text" }, "age" : { "type" : "integer" }, "bir" : { "type" : "date" }, "content" : { "type" : "text" }, "address" : { "type" : "keyword" } } } } }
1 2 3 4 5 6 7 8 9 10 11 12 13 PUT /ems/_doc/_bulk {"index" :{}} {"name" :"亨利" ,"age" :32 ,"bir" :"2012-12-12" ,"content" :"当时光的列车缓缓驶过酋长球场" ,"address" :"糖果盒" } {"index" :{}} {"name" :"范德萨" ,"age" :24 ,"bir" :"2012-12-12" ,"content" :"再见,范德萨,不老的传说,曼联有你,一生有你。" ,"address" :"上海" } {"index" :{}} {"name" :"皮尔洛" ,"age" :8 ,"bir" :"2012-12-12" ,"content" :"从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影" ,"address" :"北京" } {"index" :{}} {"name" :"卡洛斯" ,"age" :9 ,"bir" :"2012-12-12" ,"content" :"卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。" ,"address" :"南京" } {"index" :{}} {"name" :"罗纳尔多" ,"age" :43 ,"bir" :"2012-12-12" ,"content" :"世上只有一个罗纳尔多!" ,"address" :"杭州" } {"index" :{}} {"name" :"卡卡" ,"age" :59 ,"bir" :"2012-12-12" ,"content" :"天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。" ,"address" :"北京" }
URL和DSL
ElasticSearch提供了两种Query方法。
URL
示例:GET /索引/类型/_search?参数
DSL(Domain Specified Language)
示例:GET /索引/类型/_search {}
其中官方更推荐第二种,该方法基于传递JSON作为请求体(request body)格式与ES进行交互,这种方式更强大,更简洁。
对于URL方法,我们只需要简单的了解即可。
示例代码:
1 GET /ems/emp/_search?q=*&sort=age:desc&size=5 &from =0 &_source=name,age,bir
接下来,我们主要讨论DSL。
match_all
match_all
,查询所有,返回index中的所有document。
示例代码:
1 2 3 4 GET /ems/_doc/_search { "query" : { "match_all" : {} } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 6, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "罗纳尔多", "age" : 43, "bir" : "2012-12-12", "content" : "世上只有一个罗纳尔多!", "address" : "杭州" } }, 【部分运行结果略】 { "_index" : "ems", "_type" : "_doc", "_id" : "841Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "范德萨", "age" : 24, "bir" : "2012-12-12", "content" : "再见,范德萨,不老的传说,曼联有你,一生有你。", "address" : "上海" } } ] } }
返回结果说明
took
:查询耗时,单位是毫秒
timed_out
:是否超时
_shards
:分片
hits对象
:击中的结果对象
total
:击中对象的条数
max_score
:搜索最大得分(相关度)
hits数组
:符合条件的文档对象组成的数组
sort-order
sort
和order
,用于排序。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 GET /ems/_doc/_search { "query" : { "match_all" : {} }, "sort" : [ { "age" : { "order" : "desc" } }, { "bir" : { "order" : "desc" } } ] }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 6, "max_score" : null, "hits" : [ 【部分运行结果略】 { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "卡洛斯", "age" : 9, "bir" : "2012-12-12", "content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。", "address" : "南京" }, "sort" : [ 9, 1355270400000 ] }, { "_index" : "ems", "_type" : "_doc", "_id" : "9I1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "皮尔洛", "age" : 8, "bir" : "2012-12-12", "content" : "从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影", "address" : "北京" }, "sort" : [ 8, 1355270400000 ] } ] } }
解释说明:
我们进行了多字段排序,在"sort"数组中添加多个字段。
max_score
和_score
为null,因为我们指定了排序方式。
再来一个,我们再加上name进行排序。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 GET /ems/_doc/_search { "query" : { "match_all" : {} }, "sort" : [ { "age" : { "order" : "desc" } }, { "bir" : { "order" : "desc" } }, { "name" :{ "order" : "desc" } } ] }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "ems", "node": "4h-IhFgnQ4SG5s9XbJ1mBg", "reason": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } } ], "caused_by": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.", "caused_by": { "type": "illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } } }, "status": 400 }
报错了!
解释说明:因为name的数据类型是text,会被分词,不支持排序。
from-size
size
,指定查询结果中返回指定条数,默认返回值10条
from
,指定起始返回位置。
如果再加上sort
,就可以实现分页了。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 GET /ems/_doc/_search { "query" : {"match_all" : {}}, "sort" : [ { "age" : { "order" : "desc" } } ], "size" : 2 , "from" : 1 }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 6, "max_score" : null, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9o1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "罗纳尔多", "age" : 43, "bir" : "2012-12-12", "content" : "世上只有一个罗纳尔多!", "address" : "杭州" }, "sort" : [ 43 ] }, { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" }, "sort" : [ 32 ] } ] } }
_source
_source 关键字
,可以是字符串,也是一个数组。字符串表示只看一个字段,数组表示有多个字段。
示例代码:
1 2 3 4 5 GET /ems/_doc/_search { "query" : { "match_all" : {} }, "_source" : "name" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 6, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "罗纳尔多" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "941Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "卡卡" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "9I1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "皮尔洛" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "卡洛斯" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "亨利" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "841Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "范德萨" } } ] } }
示例代码:
1 2 3 4 5 6 7 8 9 10 11 GET /ems/_doc/_search { "query" : { "match_all" : {} }, "_source" : [ "name" , "age" , "money" ] }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 6, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "罗纳尔多", "age" : 43 } }, { "_index" : "ems", "_type" : "_doc", "_id" : "941Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "卡卡", "age" : 59 } }, { "_index" : "ems", "_type" : "_doc", "_id" : "9I1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "皮尔洛", "age" : 8 } }, { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "卡洛斯", "age" : 9 } }, { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "亨利", "age" : 32 } }, { "_index" : "ems", "_type" : "_doc", "_id" : "841Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "范德萨", "age" : 24 } } ] } }
解释说明:
term
term
: 使用关键词查询
示例代码:
1 2 3 4 5 6 7 8 9 10 GET /ems/_doc/_search { "query" : { "term" : { "address" : { "value" : "糖果盒" } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.6931472, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 0.6931472, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
再来一个,根据name
查询。
示例代码:
1 2 3 4 5 6 7 8 9 10 GET /ems/_doc/_search { "query" : { "term" : { "name" : { "value" : "亨利" } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } }
没有查到?但是明明有亨利啊。
示例代码:
1 2 3 4 5 6 7 8 9 10 GET /ems/_doc/_search { "query" : { "term" : { "name" : { "value" : "亨" } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.7549128, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 0.7549128, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
解释说明:
通过使用term查询,使用的是ElasticSearch中默认分词器,标准分词器(StandardAnalyzer),该分词器对于英文单词分词,对于中文单字分 【字】 。
在ElasticSearch中的八种数据类型,text
、keyword
、date
、integer
、long
、double
、boolean
和ip
中,只有text
会被分词。
特别的,我们可以看看标准分词器的效果。
示例代码:
1 2 3 4 5 6 7 GET /_analyze { "text" : [ "haha is good" , "微风" ] }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 { "tokens" : [ { "token" : "haha", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "is", "start_offset" : 5, "end_offset" : 7, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "good", "start_offset" : 8, "end_offset" : 12, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "微", "start_offset" : 13, "end_offset" : 14, "type" : "<IDEOGRAPHIC>", "position" : 3 }, { "token" : "风", "start_offset" : 14, "end_offset" : 15, "type" : "<IDEOGRAPHIC>", "position" : 4 } ] }
match_phrase
我们还可以利用match_phrase
,其首先将查询字符串解析成一个词项列表,然后对这些词项进行搜索,但只保留那些包含全部搜索词项,且位置与搜索词项相同的文档。
示例代码:
1 2 3 4 5 6 7 8 GET /ems/_doc/_search { "query" : { "match_phrase" : { "name" : "亨利" } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.2876821, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "WBamj34BCoX_YrYqZTh2", "_score" : 0.2876821, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
terms
terms,类似于SQL中的in
。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 GET /ems/_doc/_search { "query" : { "terms" : { "address" : [ "糖果盒" , "上海" ] } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "WBamj34BCoX_YrYqZTh2", "_score" : 1.0, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "WRamj34BCoX_YrYqZTh2", "_score" : 1.0, "_source" : { "name" : "范德萨", "age" : 24, "bir" : "2012-12-12", "content" : "再见,范德萨,不老的传说,曼联有你,一生有你。", "address" : "上海" } } ] } }
range
range
,查询指定范围内的文档
有四种比较规则:
lt
:小于
lte
:小于等于
gt
:大于
gte
:大于等于
示例代码:
1 2 3 4 5 6 7 8 9 10 11 GET /ems/_doc/_search { "query" : { "range" : { "age" : { "gte" : 9 , "lte" : 30 } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "卡洛斯", "age" : 9, "bir" : "2012-12-12", "content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。", "address" : "南京" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "841Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "范德萨", "age" : 24, "bir" : "2012-12-12", "content" : "再见,范德萨,不老的传说,曼联有你,一生有你。", "address" : "上海" } } ] } }
prefix
prefix
,查找含有指定前缀的关键词的相关文档。
示例代码:
1 2 3 4 5 6 7 8 9 10 GET /ems/_doc/_search { "query" : { "prefix" : { "address" : { "value" : "糖" } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
wildcard
wildcard
,通配符查询
?
,用来匹配一个任意字符
*
,用来匹配多个任意字符
示例代码:
1 2 3 4 5 6 7 8 9 10 GET /ems/_doc/_search { "query" : { "wildcard" : { "content" : { "value" : "当*" } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
有些资料会说,*
和?
可以不能写在前面,在实际测试中,是可以写在前面的。 当然,根据我们知道的倒排索引,这个写前面应该会导致查询性能不佳。 示例代码:
1 2 3 4 5 6 7 8 9 10 GET /ems/_doc/_search { "query" : { "wildcard" : { "content" : { "value" : "*场" } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
fuzzy
fuzzy
,用来模糊查询含有指定关键字的文档。
搜索关键词长度为2 2 2 ,不允许存在模糊。最大模糊为0 0 0 。
搜索关键词长度为[ 3 , 5 ] [3,5] [ 3 , 5 ] ,允许一次模糊。最大模糊为1 1 1 。
搜索关键词长度大于5 5 5 ,最大模糊为2 2 2 。
示例代码:
1 2 3 4 5 6 7 8 GET /ems/_doc/_search { "query" : { "fuzzy" : { "address" :"糖果果" } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.46209812, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 0.46209812, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
ids
ids
,值为数组类型,用来根据一组id获取多个对应的文档
示例代码:
1 2 3 4 5 6 7 8 9 10 11 GET /ems/_doc/_search { "query" : { "ids" : { "values" : [ "9Y1Bj34BoH8NsaaoSIo9" , "841Bj34BoH8NsaaoSIo9" ] } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "卡洛斯", "age" : 9, "bir" : "2012-12-12", "content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。", "address" : "南京" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "841Bj34BoH8NsaaoSIo9", "_score" : 1.0, "_source" : { "name" : "范德萨", "age" : 24, "bir" : "2012-12-12", "content" : "再见,范德萨,不老的传说,曼联有你,一生有你。", "address" : "上海" } } ] } }
bool
bool
:用来组合多个条件实现复杂查询。
must
:有点类似and
,同时成立。
should
:有点类似or
,成立一个就行。
must_not
:有点类似not
,不能满足任何一个。
那么,为什么不直接取名为and
和or
呢?因为和and
和or
又不一样。稍后我们会看到区别。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 GET /ems/_doc/_search { "query" : { "bool" : { "must" : [ { "range" : { "age" : { "gte" : 0 , "lte" : 100 } } } ], "must_not" : [ { "wildcard" : { "address" : { "value" : "糖果?" } } } ] } }, "sort" : [ { "age" : { "order" : "desc" } } ] }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : null, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "941Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "卡卡", "age" : 59, "bir" : "2012-12-12", "content" : "天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。", "address" : "北京" }, "sort" : [ 59 ] }, { "_index" : "ems", "_type" : "_doc", "_id" : "9o1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "罗纳尔多", "age" : 43, "bir" : "2012-12-12", "content" : "世上只有一个罗纳尔多!", "address" : "杭州" }, "sort" : [ 43 ] }, { "_index" : "ems", "_type" : "_doc", "_id" : "841Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "范德萨", "age" : 24, "bir" : "2012-12-12", "content" : "再见,范德萨,不老的传说,曼联有你,一生有你。", "address" : "上海" }, "sort" : [ 24 ] }, { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "卡洛斯", "age" : 9, "bir" : "2012-12-12", "content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。", "address" : "南京" }, "sort" : [ 9 ] }, { "_index" : "ems", "_type" : "_doc", "_id" : "9I1Bj34BoH8NsaaoSIo9", "_score" : null, "_source" : { "name" : "皮尔洛", "age" : 8, "bir" : "2012-12-12", "content" : "从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影", "address" : "北京" }, "sort" : [ 8 ] } ] } }
接下来,我们就要解释,为什么是must
和should
,不是and
和or
了。
首先,满足a=1
或b=2
。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 { "query" : { "bool" : { "should" : [ { "match" : { "a" : "1" }, } { "match" : { "b" : "2" } } ] } } }
这个没问题,再来一个。我们再加一个条件,“并且 c=3
”。
即:“满足a=1
或b=2
,并且c=3
”。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 { "query" : { "bool" : { "must" : [ { "match" : { "c" : "3" } } ], "should" : [ { "match" : { "a" : "1" }, } { "match" : { "b" : "2" } } ] } } }
错了!
should
在与must
、filter
同级时,默认是不需要满足should
中的任何条件的,此时我们可以加上minimum_should_match
参数,来达到我们的目的。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 { "query" : { "bool" : { "must" : [ { "match" : { "c" : "3" } } ], "should" : [ { "match" : { "a" : "1" } }, { "match" : { "b" : "2" } } ], "minimum_should_match" : 1 } } }
highlight
高亮查询
highlight
:可以让符合条件的文档中的关键词高亮
需要注意的是,这个不是查询的筛选条件,而是对查询结果做二次渲染。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GET /ems/_doc/_search { "query" : { "term" : { "content" : { "value" : "时" } } }, "highlight" : { "fields" : { "*" : {} } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 { "took" : 40, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.73050237, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 0.73050237, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" }, "highlight" : { "content" : [ "当<em>时</em>光的列车缓缓驶过酋长球场" ] } } ] } }
解释说明:em
标签,斜体。
自定义高亮html标签
可以在highlight中使用pre_tags
和post_tags
示例代码:
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 GET /ems/_doc/_search { "query" : { "term" : { "content" : { "value" : "时" } } }, "highlight" : { "pre_tags" : [ "<span style='color:red'>" ], "post_tags" : [ "</span>" ], "fields" : { "*" : {} } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.73050237, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "8o1Bj34BoH8NsaaoSIo9", "_score" : 0.73050237, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" }, "highlight" : { "content" : [ "当<span style='color:red'>时</span>光的列车缓缓驶过酋长球场" ] } } ] } }
多字段高亮
多字段高亮,使用require_field_match
开启多个字段高亮。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 GET /ems/_doc/_search { "query" : { "term" : { "content" : "卡" } }, "highlight" : { "pre_tags" : [ "<span style='color:red'>" ], "post_tags" : [ "</span>" ], "require_field_match" : false , "fields" : { "*" : {} } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 { "took" : 5, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.9266379, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : 0.9266379, "_source" : { "name" : "卡洛斯", "age" : 9, "bir" : "2012-12-12", "content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。", "address" : "南京" }, "highlight" : { "name" : [ "<span style='color:red'>卡</span>洛斯" ], "content" : [ "<span style='color:red'>卡</span>洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给<span style='color:red'>卡</span>洛斯留下了不可抹去的金色记忆。" ] } } ] } }
注意!需要将require_field_match
设置为false
,在fields
中填字段。
multi_match
multi_match
,多字段查询。
特点为:
如果搜索的字段分词,会对关键词先分词,再搜索。
如果搜索的字段不分词,会直接使用关键词搜索。
所以在fields中,一般都是可分词字段。
示例代码:
1 2 3 4 5 6 7 8 9 GET /ems/_doc/_search { "query" : { "multi_match" : { "query" : "卡卡" , "fields" : ["name" ,"content" ] } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 { "took" : 5, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.8532758, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "9Y1Bj34BoH8NsaaoSIo9", "_score" : 1.8532758, "_source" : { "name" : "卡洛斯", "age" : 9, "bir" : "2012-12-12", "content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。", "address" : "南京" } }, { "_index" : "ems", "_type" : "_doc", "_id" : "941Bj34BoH8NsaaoSIo9", "_score" : 0.7911257, "_source" : { "name" : "卡卡", "age" : 59, "bir" : "2012-12-12", "content" : "天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。", "address" : "北京" } } ] } }
query_string
query_string
,多字段分词查询。处理在查询的时候能分词,还能指定分词器。
示例代码:
1 2 3 4 5 6 7 8 9 10 GET /dangdang/book/_search { "query" : { "query_string" : { "query" : "中国声音" , "analyzer" : "ik_max_word" , "fields" : ["name" ,"content" ] } } }
Filter
Filter,译作过滤。
过滤查询
ELasticSearch中的查询分为两种。
查询(query)
:默认会计算每个返回文档的得分,然后根据得分排序
过滤(filter)
:只会筛选出符合的文档,并不计算得分。
所以,单从性能考虑,过滤比查询更快。
过滤语法
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 GET /ems/_doc/_search { "query" : { "bool" : { "must" : [ { "match_all" : {} } ], "filter" : { "term" : { "age" : 32 } } } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "WBamj34BCoX_YrYqZTh2", "_score" : 1.0, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" } } ] } }
在执行filter
和query
时,先执行filter
,后执行query
。
常见的过滤器类型有:
term
terms
ranage
exists
:过滤存在指定字段,且字段不为空的index。
我们举一个exists
的例子。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 GET /ems/_doc/_search { "query" : { "bool" : { "must" : [ { "term" : { "name" : { "value" : "中国" } } } ], "filter" : { "exists" : { "field" : "haha" } } } } }
讲了这么多查询?那么接下来,应该是关联查询了吧。 没有关联查询。 虽然ElasticSearch支持join,但是官方不建议我们这么做,因为性能极差。 如果一定要做join,应该从程序或者建宽表的角度处理。
IK分词器
ElasticSearch采取的默认分词器是标准分词器,该分词器对于中文是单字分词。
我们可以采用IK分词器
Github地址为:https://github.com/medcl/elasticsearch-analysis-ik
在线安装
在bin目录中执行如下命令,进行安装。1 ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.23/elasticsearch-analysis-ik-6.8.23.zip
重启生效
有一些资料说,安装分词器之后,需要把ElasticSearch中的历史索引数据删除,即删除ElasticSearch安装目录中的data文件夹。
实际测试,其实完全不需要!
而且,安装一个分词器,就需要删除历史索引数据? ElasticSearch中可能有几百G,甚至1T的数据。安装一个分词器,就要把数据删了,重新导入? 不至于吧。
最后,我们论述一下elasticsearch-plugin的相关命令。
list
:Lists installed elasticsearch plugins
install
:Install a plugin
remove
:removes a plugin from Elasticsearch
需要注意的是,在线安装的IK配置文件为
1 {ElasticSearcg安装目录}/config/analysis-ik/IKAnalyzer.cfg.xml
本地安装IK
本地安装:
将IK分词器传输至服务器。
解压。1 unzip elasticsearch-analysis-ik-6.8.23.zip
移动至plugins文件夹1 2 cd plugins/ cp -r ~/elasticsearch-analysis-ik-6.8.23 ./
重启生效。
需要注意的是,本地安装的IK配置文件为
1 {ElasticSearch安装目录中}/plugins/analysis-ik/config/IKAnalyzer.cfg.xml
测试IK分词器
IK分词器提供了两种分词方法:
ik_max_word
: 会将文本做最细粒度的拆分。
ik_smart
: 会做粗粒度的拆分。
我们直接看例子。
示例代码:
1 2 3 4 5 GET /_analyze { "text" : ["中华人民共和国国歌" ], "analyzer" : "ik_max_word" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 { "tokens" : [ { "token" : "中华人民共和国", "start_offset" : 0, "end_offset" : 7, "type" : "CN_WORD", "position" : 0 }, { "token" : "中华人民", "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 1 }, { "token" : "中华", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 2 }, { "token" : "华人", "start_offset" : 1, "end_offset" : 3, "type" : "CN_WORD", "position" : 3 }, { "token" : "人民共和国", "start_offset" : 2, "end_offset" : 7, "type" : "CN_WORD", "position" : 4 }, { "token" : "人民", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 5 }, { "token" : "共和国", "start_offset" : 4, "end_offset" : 7, "type" : "CN_WORD", "position" : 6 }, { "token" : "共和", "start_offset" : 4, "end_offset" : 6, "type" : "CN_WORD", "position" : 7 }, { "token" : "国", "start_offset" : 6, "end_offset" : 7, "type" : "CN_CHAR", "position" : 8 }, { "token" : "国歌", "start_offset" : 7, "end_offset" : 9, "type" : "CN_WORD", "position" : 9 } ] }
示例代码:
1 2 3 4 5 GET /_analyze { "text" : ["中华人民共和国国歌" ], "analyzer" : "ik_smart" }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 { "tokens" : [ { "token" : "中华人民共和国", "start_offset" : 0, "end_offset" : 7, "type" : "CN_WORD", "position" : 0 }, { "token" : "国歌", "start_offset" : 7, "end_offset" : 9, "type" : "CN_WORD", "position" : 1 } ] }
创建index指定分词器
我们可以利用analyzer
和search_analyzer
,在创建index的时候指定分词器。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 PUT /ems { "mappings" :{ "_doc" :{ "properties" :{ "name" :{ "type" :"text" , "analyzer" : "ik_max_word" , "search_analyzer" : "ik_max_word" }, "age" :{ "type" :"integer" }, "bir" :{ "type" :"date" }, "content" :{ "type" :"text" , "analyzer" : "ik_max_word" , "search_analyzer" : "ik_max_word" }, "address" :{ "type" :"keyword" } } } } }
1 2 3 4 5 6 7 8 9 10 11 12 13 PUT /ems/_doc/_bulk {"index" :{}} {"name" :"亨利" ,"age" :32 ,"bir" :"2012-12-12" ,"content" :"当时光的列车缓缓驶过酋长球场" ,"address" :"糖果盒" } {"index" :{}} {"name" :"范德萨" ,"age" :24 ,"bir" :"2012-12-12" ,"content" :"再见,范德萨,不老的传说,曼联有你,一生有你。" ,"address" :"上海" } {"index" :{}} {"name" :"皮尔洛" ,"age" :8 ,"bir" :"2012-12-12" ,"content" :"从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影" ,"address" :"北京" } {"index" :{}} {"name" :"卡洛斯" ,"age" :9 ,"bir" :"2012-12-12" ,"content" :"卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。" ,"address" :"南京" } {"index" :{}} {"name" :"罗纳尔多" ,"age" :43 ,"bir" :"2012-12-12" ,"content" :"世上只有一个罗纳尔多!" ,"address" :"杭州" } {"index" :{}} {"name" :"卡卡" ,"age" :59 ,"bir" :"2012-12-12" ,"content" :"天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。" ,"address" :"北京" }
试一下。
示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GET /ems/_doc/_search { "query" :{ "term" :{ "content" :"时光" } }, "highlight" : { "pre_tags" : ["<span style='color:red'>" ], "post_tags" : ["</span>" ], "fields" : { "*" :{} } } }
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 { "took" : 40, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.2876821, "hits" : [ { "_index" : "ems", "_type" : "_doc", "_id" : "WBamj34BCoX_YrYqZTh2", "_score" : 0.2876821, "_source" : { "name" : "亨利", "age" : 32, "bir" : "2012-12-12", "content" : "当时光的列车缓缓驶过酋长球场", "address" : "糖果盒" }, "highlight" : { "content" : [ "当<span style='color:red'>时光</span>的列车缓缓驶过酋长球场" ] } } ] } }
配置扩展词
IK支持自定义扩展词典
和停用词典
扩展词典
:希望添加进词典的词
停用词典
:希望从词典中移除的词
修改IKAnalyzer.cfg.xml
,即可添加扩展词。
1 2 3 4 5 6 7 8 9 10 11 12 13 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd" > <properties > <comment > IK Analyzer 扩展配置</comment > <entry key ="ext_dict" > </entry > <entry key ="ext_stopwords" > </entry > </properties >
解释说明:
ext_dict
:本地扩展词
ext_stopwords
:本地停用词
remote_ext_dict
:远程扩展词
remote_ext_stopwords
:远程停用词