avatar


2.基本操作

index

index,译作索引。
我们依次讨论索引的创建、查看和删除。

创建

创建方法

创建index的方法为PUT /{indexName}

示例代码:

1
PUT /kaka

运行结果:

1
2
3
4
5
6
#! Deprecation: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "kaka"
}

解释说明:

  • "acknowledged" : true:索引创建成功
  • "shards_acknowledged" : true:分片创建成功
  • "index" : "kaka":索引名称

指定参数

我们再来解释一下#!开头的第一行。
在7版本之前,每创建一个索引,默认都会有5个分片。但是从7版本开始,默认的分片数是1。如果需要5个分片,需要在创建索引的时候加入明确的规则。
(我们用的是6版本,无需考虑这一点。)
加入明确规则的例子如下:
示例代码:

1
2
3
4
5
6
7
PUT /kk
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 5
}
}

Kibana有提示和补全功能。但如果第一个{没有换行,而是和/kk同一行的话,会导致提示补全失效。

所有字母必须小写

正如我们上一章《1.工具、概念和集群》所述,所有字母必须小写。

示例代码:

1
PUT /KKK

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"error": {
"root_cause": [
{
"type": "invalid_index_name_exception",
"reason": "Invalid index name [KKK], must be lowercase",
"index_uuid": "_na_",
"index": "KKK"
}
],
"type": "invalid_index_name_exception",
"reason": "Invalid index name [KKK], must be lowercase",
"index_uuid": "_na_",
"index": "KKK"
},
"status": 400
}

解释说明:因为存在大写字母。说以报错了。

1
2
"type": "invalid_index_name_exception",
"reason": "Invalid index name [KKK], must be lowercase",

查看

查看方法

查看索引的方法为GET /_cat/indices
示例代码:

1
GET /_cat/indices

运行结果:

1
2
3
4
green  open .kibana_1            azRP1WYcSlSKg_Iu8kim_Q 1 0 4 1 16.8kb 16.8kb
yellow open kk a3akwYNuTG2Bq0maLLE0-A 5 1 0 0 1.1kb 1.1kb
green open .kibana_task_manager HsnZtouxT2i40qSFeeO9ug 1 0 2 0 12.6kb 12.6kb
yellow open kaka bOal4r2bTzCKfoR57mvWqg 5 1 0 0 1.1kb 1.1kb

解释说明:

  • 在6.8.0及以上版本中,Kibana会创建两个索引.kibana_1.kibana_task_manager

显示表头

那么,上述运行结果的greenyellow,又代表什么呢?

我们在命令的结尾加上?v,显示表头。

示例代码:

1
GET /_cat/indices?v

运行结果:

1
2
3
4
5
health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 azRP1WYcSlSKg_Iu8kim_Q 1 0 4 1 16.8kb 16.8kb
yellow open kk a3akwYNuTG2Bq0maLLE0-A 5 1 0 0 1.2kb 1.2kb
green open .kibana_task_manager HsnZtouxT2i40qSFeeO9ug 1 0 2 0 12.6kb 12.6kb
yellow open kaka bOal4r2bTzCKfoR57mvWqg 5 1 0 0 1.2kb 1.2kb

解释说明:

  • health:健康度
    • yellow:不健壮的(可用),因为索引被分成了5个分片,但是这5个分片又被放在了一个节点上。
    • green:健壮的(可用)
    • red:不可用的
  • status:状态
  • index:索引
  • uuid:唯一标识
  • pri:分片数
  • rep:副本数
  • docs.count:文档数量
  • docs.deleted:被删除文档数
  • store.size:存储大小
  • pri.store.size:主分片存储大小

删除

删除方法

删除索引的方法为DELETE /{indexName}
示例代码:

1
DELETE /kk

运行结果:

1
2
3
{
"acknowledged" : true
}

特别提示:如果删除了Kibana的索引,会导致Kibana不可用。如果已经删除了,重启Kibana,会重新自建索引,然后可以恢复。

删除所有

删除所有索引的方法为DELETE /_all

示例代码:

1
DELETE /_all

运行结果:

1
2
3
{
"acknowledged" : true
}

有些资料会说,删除所有索引可以用DELETE /*,但这是在6.8.23的版本中,已经不可以这么做了。
示例代码:

1
DELETE /*

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Wildcard expressions or all indices are not allowed"
}
],
"type": "illegal_argument_exception",
"reason": "Wildcard expressions or all indices are not allowed"
},
"status": 400
}

mapping

如果把索引比拟为表的话,mapping就是表结构。

创建

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
PUT /est
{
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"bir": {
"type": "date"
}
}
}
}
}

运行结果:

1
2
3
4
5
6
7
#! Deprecation: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template
#! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in create index requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', and requests are expected to omit the type name in mapping definitions.
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "est"
}

解释说明:

  • est:索引名
  • mappings:关键词。在6版本之前,一个index有多个type,所以是复数。虽然6版本之后,一个index一个type,但是复数形式被保留了。
  • _doc:类型名,这里遵循了ElasticSearch官方的建议,名字为_doc
  • properties:关键词,说明接下来的是字段
  • idnameagebir:field,字段。
  • type:关键词,说明接下来的是数据类型
  • keywordtextintegerdate:数据类型

数据类型

在ElasticSearch中,数据类型八种:

  • text
  • keyword
  • date
  • integer
  • long
  • double
  • boolean
  • ip

其中text会被分词,keyword不会被分词。

查看

查看方法

查看方法为GET /{indexName}

示例代码:

1
GET /est

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get indices requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions.
{
"est" : {
"aliases" : { },
"mappings" : {
"_doc" : {
"properties" : {
"age" : {
"type" : "integer"
},
"bir" : {
"type" : "date"
},
"id" : {
"type" : "keyword"
},
"name" : {
"type" : "text"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1643078198880",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "eZxussN2RUaHxkqHOrObPg",
"version" : {
"created" : "6082399"
},
"provided_name" : "est"
}
}
}
}

只看mapping

如上,返回了所有的内容。如果只想看mapping,方法为GET /{indexName}/_mapping

示例代码:

1
GET /est/_mapping

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get mapping requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions.
{
"est" : {
"mappings" : {
"_doc" : {
"properties" : {
"age" : {
"type" : "integer"
},
"bir" : {
"type" : "date"
},
"id" : {
"type" : "keyword"
},
"name" : {
"type" : "text"
}
}
}
}
}
}

document

document,文档,可以理解为一行记录。

新增

新增方法

新增document的方法为POST /{indexName}/{typeName}【JSON格式的内容】

示例代码:

1
2
3
4
5
6
7
POST /est/_doc
{
"id": "一号",
"name":"赵小六",
"age":23,
"bir":"2012-12-12"
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index" : "est",
"_type" : "_doc",
"_id" : "5Y0ij34BoH8Nsaao-4rh",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}

POST和PUT

新增文档用POST
有部分资料说新增文档用PUT,但是根据我的实际测试,在6.8.23中,已经不支持PUT了,只能有POST
(但如果指定ID的话,又可以用PUT。)

示例代码:

1
2
3
4
5
6
7
PUT /est/_doc
{
"id": "一号",
"name":"赵小六",
"age":23,
"bir":"2012-12-12"
}

运行结果:

1
2
3
4
{
"error": "Incorrect HTTP method for uri [/est/_doc?pretty] and method [PUT], allowed: [POST]",
"status": 405
}

在RestFul中,通常情况下,POSTGETPUTDELETE分别对应CRUD,但有时候POST和PUT会混用。

指定id

如果我们想指定id怎么办?格式如下:

1
POST /{indexName}/{typeName}/{id}

示例代码:

1
2
3
4
5
6
7
POST /est/_doc/1
{
"id": "一号",
"name":"赵小六",
"age":23,
"bir":"2012-12-12"
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}

查询

查询方法为GET /{indexName}/{typeName}/{文档ID}

示例代码:

1
GET /est/_doc/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : "一号",
"name" : "赵小六",
"age" : 23,
"bir" : "2012-12-12"
}
}

删除

删除document的方法为DELETE /{indexName}/{typeName}/{文档ID}
示例代码:

1
DELETE /est/_doc/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}

更新

不保留原始数据

更新?来吧!

示例代码:

1
2
3
4
5
6
7
POST /est/_doc/1
{
"id": "一号",
"name":"阿门",
"age":23,
"bir":"2012-12-12"
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 7,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1
}

来看看结果。
示例代码:

1
GET /est/_doc/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 9,
"_seq_no" : 9,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : "一号",
"name" : "阿门",
"age" : 23,
"bir" : "2012-12-12"
}
}

更新成功!

再来一个!

示例代码:

1
2
3
4
POST /est/emp/1
{
"id": "天字第一号"
}
1
GET /est/emp/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 10,
"_seq_no" : 10,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : "天字第一号"
}
}

解释说明:
POST /{indexName}/{typeName}/{id},这种方式类似于先删除,再增加。

保留原始数据更新

保留原始数据更新,需要添加关键词_update

示例代码:

1
2
3
4
5
6
POST /est/_doc/1/_update
{
"doc": {
"name": "娃哈哈"
}
}
1
GET /est/_doc/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 12,
"_seq_no" : 12,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : "一号",
"name" : "娃哈哈",
"age" : 23,
"bir" : "2012-12-12"
}
}

更新时加字段

我们还可以在更新的时候,添加一个不存在字段。

示例代码:

1
2
3
4
5
6
7
8
POST /est/_doc/1/_update
{
"doc": {
"name": "张三疯",
"age": 11,
"dpet": "武当派"
}
}
1
GET /est/_doc/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 13,
"_seq_no" : 13,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : "一号",
"name" : "张三疯",
"age" : 11,
"bir" : "2012-12-12",
"dpet" : "武当派"
}
}

居然成功了。
再来看看新的mapping。

示例代码:

1
GET /est/_mapping

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#! Deprecation: [types removal] The parameter include_type_name should be explicitly specified in get mapping requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', which means responses will omit the type name in mapping definitions.
{
"est" : {
"mappings" : {
"_doc" : {
"properties" : {
"age" : {
"type" : "integer"
},
"bir" : {
"type" : "date"
},
"dpet" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "keyword"
},
"name" : {
"type" : "text"
}
}
}
}
}
}

脚本更新

最后一个操作,脚本更新。
示例代码:

1
2
3
4
POST /est/_doc/1/_update
{
"script": "ctx._source.age += 5"
}
1
GET /est/_doc/1

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 14,
"_seq_no" : 14,
"_primary_term" : 1,
"found" : true,
"_source" : {
"id" : "一号",
"name" : "张三疯",
"age" : 16,
"bir" : "2012-12-12",
"dpet" : "武当派"
}
}

批量

批量基于关键字_bulk

示例代码:

1
2
3
4
5
6
7
8
POST /est/_doc/_bulk
{"index":{"_id":3}}
{"name":"张三三","age":11,"dpet":"武当派"}
{"delete":{"_id":2}}
{"update":{"_id":1}}
{"doc":{"name":"张三疯","age":11,"dpet":"武当派"}}
{"update":{"_id":2}}
{"doc":{"name":"张三疯","age":11,"dpet":"武当派"}}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
{
"took" : 9,
"errors" : true,
"items" : [
{
"index" : {
"_index" : "est",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"delete" : {
"_index" : "est",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 404
}
},
{
"update" : {
"_index" : "est",
"_type" : "_doc",
"_id" : "1",
"_version" : 15,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 15,
"_primary_term" : 1,
"status" : 200
}
},
{
"update" : {
"_index" : "est",
"_type" : "_doc",
"_id" : "2",
"status" : 404,
"error" : {
"type" : "document_missing_exception",
"reason" : "[_doc][2]: document missing",
"index_uuid" : "eZxussN2RUaHxkqHOrObPg",
"shard" : "2",
"index" : "est"
}
}
}
]
}

解释说明:
在示例代码中:

  • 第一行表示要操作的doc,以及操作类型
  • index的含义为新增
  • delete的含义为删除
  • update的含义为修改

在运行结果中:

  • 运行结果会依次返回每一项操作的结果。
  • 不会因为一个失败而全部失败。
    (没有事务,本来就是做搜索数据库,搜索。)

Query

Query,译作高级搜索、高级查询、高级检索。

假设存在mapping和数据如下:

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
PUT /ems
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"bir": {
"type": "date"
},
"content": {
"type": "text"
},
"address": {
"type": "keyword"
}
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
PUT /ems/_doc/_bulk
{"index":{}}
{"name":"亨利","age":32,"bir":"2012-12-12","content":"当时光的列车缓缓驶过酋长球场","address":"糖果盒"}
{"index":{}}
{"name":"范德萨","age":24,"bir":"2012-12-12","content":"再见,范德萨,不老的传说,曼联有你,一生有你。","address":"上海"}
{"index":{}}
{"name":"皮尔洛","age":8,"bir":"2012-12-12","content":"从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影","address":"北京"}
{"index":{}}
{"name":"卡洛斯","age":9,"bir":"2012-12-12","content":"卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。","address":"南京"}
{"index":{}}
{"name":"罗纳尔多","age":43,"bir":"2012-12-12","content":"世上只有一个罗纳尔多!","address":"杭州"}
{"index":{}}
{"name":"卡卡","age":59,"bir":"2012-12-12","content":"天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。","address":"北京"}

URL和DSL

ElasticSearch提供了两种Query方法。

  1. URL
    示例:GET /索引/类型/_search?参数
  2. DSL(Domain Specified Language)
    示例:GET /索引/类型/_search {}

其中官方更推荐第二种,该方法基于传递JSON作为请求体(request body)格式与ES进行交互,这种方式更强大,更简洁。

对于URL方法,我们只需要简单的了解即可。
示例代码:

1
GET /ems/emp/_search?q=*&sort=age:desc&size=5&from=0&_source=name,age,bir

接下来,我们主要讨论DSL。

match_all

match_all,查询所有,返回index中的所有document。

示例代码:

1
2
3
4
GET /ems/_doc/_search
{
"query": { "match_all": {} }
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "罗纳尔多",
"age" : 43,
"bir" : "2012-12-12",
"content" : "世上只有一个罗纳尔多!",
"address" : "杭州"
}
},

【部分运行结果略】

{
"_index" : "ems",
"_type" : "_doc",
"_id" : "841Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "范德萨",
"age" : 24,
"bir" : "2012-12-12",
"content" : "再见,范德萨,不老的传说,曼联有你,一生有你。",
"address" : "上海"
}
}
]
}
}

返回结果说明

  • took:查询耗时,单位是毫秒
  • timed_out:是否超时
  • _shards:分片
  • hits对象:击中的结果对象
  • total:击中对象的条数
  • max_score:搜索最大得分(相关度)
  • hits数组:符合条件的文档对象组成的数组

sort-order

sortorder,用于排序。
示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET /ems/_doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
},
{
"bir": {
"order": "desc"
}
}
]
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [

【部分运行结果略】

{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "卡洛斯",
"age" : 9,
"bir" : "2012-12-12",
"content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。",
"address" : "南京"
},
"sort" : [
9,
1355270400000
]
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9I1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "皮尔洛",
"age" : 8,
"bir" : "2012-12-12",
"content" : "从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影",
"address" : "北京"
},
"sort" : [
8,
1355270400000
]
}
]
}
}

解释说明:

  • 我们进行了多字段排序,在"sort"数组中添加多个字段。
  • max_score_score为null,因为我们指定了排序方式。

再来一个,我们再加上name进行排序。
示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
GET /ems/_doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
},
{
"bir": {
"order": "desc"
}
},
{
"name" :{
"order": "desc"
}
}
]
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "ems",
"node": "4h-IhFgnQ4SG5s9XbJ1mBg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}

报错了!
解释说明:因为name的数据类型是text,会被分词,不支持排序。

from-size

size,指定查询结果中返回指定条数,默认返回值10条
from,指定起始返回位置。

如果再加上sort,就可以实现分页了。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
GET /ems/_doc/_search
{
"query": {"match_all": {}},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"size": 2,
"from": 1
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9o1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "罗纳尔多",
"age" : 43,
"bir" : "2012-12-12",
"content" : "世上只有一个罗纳尔多!",
"address" : "杭州"
},
"sort" : [
43
]
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
},
"sort" : [
32
]
}
]
}
}

_source

_source 关键字,可以是字符串,也是一个数组。字符串表示只看一个字段,数组表示有多个字段。
示例代码:

1
2
3
4
5
GET /ems/_doc/_search
{
"query": { "match_all": {} },
"_source": "name"
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "罗纳尔多"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "941Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "卡卡"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9I1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "皮尔洛"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "卡洛斯"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "亨利"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "841Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "范德萨"
}
}
]
}
}

示例代码:

1
2
3
4
5
6
7
8
9
10
11
GET /ems/_doc/_search
{
"query": {
"match_all": {}
},
"_source": [
"name",
"age",
"money"
]
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "罗纳尔多",
"age" : 43
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "941Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "卡卡",
"age" : 59
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9I1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "皮尔洛",
"age" : 8
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "卡洛斯",
"age" : 9
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "亨利",
"age" : 32
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "841Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "范德萨",
"age" : 24
}
}
]
}
}

解释说明:

  • 对于没有的字段(例如money),不会报错。

term

term: 使用关键词查询
示例代码:

1
2
3
4
5
6
7
8
9
10
GET /ems/_doc/_search
{
"query": {
"term": {
"address": {
"value": "糖果盒"
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 0.6931472,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

再来一个,根据name查询。
示例代码:

1
2
3
4
5
6
7
8
9
10
GET /ems/_doc/_search
{
"query": {
"term": {
"name": {
"value": "亨利"
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

没有查到?但是明明有亨利啊。

示例代码:

1
2
3
4
5
6
7
8
9
10
GET /ems/_doc/_search
{
"query": {
"term": {
"name": {
"value": "亨"
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.7549128,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 0.7549128,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

解释说明:

  • 通过使用term查询,使用的是ElasticSearch中默认分词器,标准分词器(StandardAnalyzer),该分词器对于英文单词分词,对于中文单字分 【字】
  • 在ElasticSearch中的八种数据类型,textkeyworddateintegerlongdoublebooleanip中,只有text会被分词。

特别的,我们可以看看标准分词器的效果。
示例代码:

1
2
3
4
5
6
7
GET /_analyze
{
"text": [
"haha is good",
"微风"
]
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"tokens" : [
{
"token" : "haha",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "is",
"start_offset" : 5,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "good",
"start_offset" : 8,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "微",
"start_offset" : 13,
"end_offset" : 14,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "风",
"start_offset" : 14,
"end_offset" : 15,
"type" : "<IDEOGRAPHIC>",
"position" : 4
}
]
}
  • 中文被分成了一个一个的字。

match_phrase

我们还可以利用match_phrase,其首先将查询字符串解析成一个词项列表,然后对这些词项进行搜索,但只保留那些包含全部搜索词项,且位置与搜索词项相同的文档。

示例代码:

1
2
3
4
5
6
7
8
GET /ems/_doc/_search
{
"query": {
"match_phrase": {
"name": "亨利"
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "WBamj34BCoX_YrYqZTh2",
"_score" : 0.2876821,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

terms

terms,类似于SQL中的in

示例代码:

1
2
3
4
5
6
7
8
9
10
11
GET /ems/_doc/_search
{
"query": {
"terms": {
"address": [
"糖果盒",
"上海"
]
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "WBamj34BCoX_YrYqZTh2",
"_score" : 1.0,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "WRamj34BCoX_YrYqZTh2",
"_score" : 1.0,
"_source" : {
"name" : "范德萨",
"age" : 24,
"bir" : "2012-12-12",
"content" : "再见,范德萨,不老的传说,曼联有你,一生有你。",
"address" : "上海"
}
}
]
}
}

range

range,查询指定范围内的文档
有四种比较规则:

  1. lt:小于
  2. lte:小于等于
  3. gt:大于
  4. gte:大于等于

示例代码:

1
2
3
4
5
6
7
8
9
10
11
GET /ems/_doc/_search
{
"query": {
"range": {
"age": {
"gte": 9,
"lte": 30
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "卡洛斯",
"age" : 9,
"bir" : "2012-12-12",
"content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。",
"address" : "南京"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "841Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "范德萨",
"age" : 24,
"bir" : "2012-12-12",
"content" : "再见,范德萨,不老的传说,曼联有你,一生有你。",
"address" : "上海"
}
}
]
}
}

prefix

prefix,查找含有指定前缀的关键词的相关文档。

示例代码:

1
2
3
4
5
6
7
8
9
10
GET /ems/_doc/_search
{
"query": {
"prefix": {
"address": {
"value": "糖"
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

wildcard

wildcard,通配符查询

  • ?,用来匹配一个任意字符
  • *,用来匹配多个任意字符

示例代码:

1
2
3
4
5
6
7
8
9
10
GET /ems/_doc/_search
{
"query": {
"wildcard": {
"content": {
"value": "当*"
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

有些资料会说,*?可以不能写在前面,在实际测试中,是可以写在前面的。
当然,根据我们知道的倒排索引,这个写前面应该会导致查询性能不佳。
示例代码:

1
2
3
4
5
6
7
8
9
10
GET /ems/_doc/_search
{
"query": {
"wildcard": {
"content": {
"value": "*场"
}
}
}
}
运行结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

fuzzy

fuzzy,用来模糊查询含有指定关键字的文档。

  • 搜索关键词长度为22,不允许存在模糊。最大模糊为00
  • 搜索关键词长度为[3,5][3,5],允许一次模糊。最大模糊为11
  • 搜索关键词长度大于55,最大模糊为22

示例代码:

1
2
3
4
5
6
7
8
GET /ems/_doc/_search
{
"query": {
"fuzzy": {
"address":"糖果果"
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.46209812,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 0.46209812,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}

ids

ids,值为数组类型,用来根据一组id获取多个对应的文档

示例代码:

1
2
3
4
5
6
7
8
9
10
11
GET /ems/_doc/_search
{
"query": {
"ids": {
"values": [
"9Y1Bj34BoH8NsaaoSIo9",
"841Bj34BoH8NsaaoSIo9"
]
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "卡洛斯",
"age" : 9,
"bir" : "2012-12-12",
"content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。",
"address" : "南京"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "841Bj34BoH8NsaaoSIo9",
"_score" : 1.0,
"_source" : {
"name" : "范德萨",
"age" : 24,
"bir" : "2012-12-12",
"content" : "再见,范德萨,不老的传说,曼联有你,一生有你。",
"address" : "上海"
}
}
]
}
}

bool

bool:用来组合多个条件实现复杂查询。

  • must:有点类似and,同时成立。
  • should:有点类似or,成立一个就行。
  • must_not:有点类似not,不能满足任何一个。

那么,为什么不直接取名为andor呢?因为和andor又不一样。稍后我们会看到区别。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
GET /ems/_doc/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gte": 0,
"lte": 100
}
}
}
],
"must_not": [
{
"wildcard": {
"address": {
"value": "糖果?"
}
}
}
]
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "941Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "卡卡",
"age" : 59,
"bir" : "2012-12-12",
"content" : "天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。",
"address" : "北京"
},
"sort" : [
59
]
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9o1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "罗纳尔多",
"age" : 43,
"bir" : "2012-12-12",
"content" : "世上只有一个罗纳尔多!",
"address" : "杭州"
},
"sort" : [
43
]
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "841Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "范德萨",
"age" : 24,
"bir" : "2012-12-12",
"content" : "再见,范德萨,不老的传说,曼联有你,一生有你。",
"address" : "上海"
},
"sort" : [
24
]
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "卡洛斯",
"age" : 9,
"bir" : "2012-12-12",
"content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。",
"address" : "南京"
},
"sort" : [
9
]
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9I1Bj34BoH8NsaaoSIo9",
"_score" : null,
"_source" : {
"name" : "皮尔洛",
"age" : 8,
"bir" : "2012-12-12",
"content" : "从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影",
"address" : "北京"
},
"sort" : [
8
]
}
]
}
}

接下来,我们就要解释,为什么是mustshould,不是andor了。

首先,满足a=1b=2
示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"query": {

"bool": {

"should": [

{
"match": {

"a": "1"

},
}

{
"match": {

"b": "2"

}
}

]
}
}
}

这个没问题,再来一个。我们再加一个条件,“并且 c=3”。
即:“满足a=1b=2,并且c=3”。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"query": {

"bool": {

"must": [

{
"match": {

"c": "3"

}
}

],

"should": [

{
"match": {

"a": "1"

},
} {
"match": {

"b": "2"

}
}

]

}
}
}

错了!
should在与mustfilter同级时,默认是不需要满足should中的任何条件的,此时我们可以加上minimum_should_match参数,来达到我们的目的。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
"query": {

"bool": {

"must": [

{
"match": {

"c": "3"

}
}

],

"should": [

{
"match": {

"a": "1"

}
},
{
"match": {

"b": "2"

}
}

],

"minimum_should_match": 1

}
}
}

highlight

高亮查询

highlight:可以让符合条件的文档中的关键词高亮

需要注意的是,这个不是查询的筛选条件,而是对查询结果做二次渲染。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET /ems/_doc/_search
{
"query": {
"term": {
"content": {
"value": "时"
}
}
},
"highlight": {
"fields": {
"*": {}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.73050237,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 0.73050237,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
},
"highlight" : {
"content" : [
"当<em>时</em>光的列车缓缓驶过酋长球场"
]
}
}
]
}
}

解释说明:em标签,斜体。

自定义高亮html标签

可以在highlight中使用pre_tagspost_tags

示例代码:
示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
GET /ems/_doc/_search
{
"query": {
"term": {
"content": {
"value": "时"
}
}
},
"highlight": {
"pre_tags": [
"<span style='color:red'>"
],
"post_tags": [
"</span>"
],
"fields": {
"*": {}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.73050237,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "8o1Bj34BoH8NsaaoSIo9",
"_score" : 0.73050237,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
},
"highlight" : {
"content" : [
"当<span style='color:red'>时</span>光的列车缓缓驶过酋长球场"
]
}
}
]
}
}

多字段高亮

多字段高亮,使用require_field_match开启多个字段高亮。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
GET /ems/_doc/_search
{
"query": {
"term": {
"content": "卡"
}
},
"highlight": {
"pre_tags": [
"<span style='color:red'>"
],
"post_tags": [
"</span>"
],
"require_field_match": false,
"fields": {
"*": {}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.9266379,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : 0.9266379,
"_source" : {
"name" : "卡洛斯",
"age" : 9,
"bir" : "2012-12-12",
"content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。",
"address" : "南京"
},
"highlight" : {
"name" : [
"<span style='color:red'>卡</span>洛斯"
],
"content" : [
"<span style='color:red'>卡</span>洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给<span style='color:red'>卡</span>洛斯留下了不可抹去的金色记忆。"
]
}
}
]
}
}

注意!需要将require_field_match设置为false,在fields中填字段。

multi_match

multi_match,多字段查询。
特点为:

  1. 如果搜索的字段分词,会对关键词先分词,再搜索。
  2. 如果搜索的字段不分词,会直接使用关键词搜索。

所以在fields中,一般都是可分词字段。

示例代码:

1
2
3
4
5
6
7
8
9
GET /ems/_doc/_search
{
"query": {
"multi_match": {
"query": "卡卡",
"fields": ["name","content"]
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.8532758,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "9Y1Bj34BoH8NsaaoSIo9",
"_score" : 1.8532758,
"_source" : {
"name" : "卡洛斯",
"age" : 9,
"bir" : "2012-12-12",
"content" : "卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。",
"address" : "南京"
}
},
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "941Bj34BoH8NsaaoSIo9",
"_score" : 0.7911257,
"_source" : {
"name" : "卡卡",
"age" : 59,
"bir" : "2012-12-12",
"content" : "天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。",
"address" : "北京"
}
}
]
}
}

query_string

query_string,多字段分词查询。处理在查询的时候能分词,还能指定分词器。

示例代码:

1
2
3
4
5
6
7
8
9
10
GET /dangdang/book/_search
{
"query": {
"query_string": {
"query": "中国声音",
"analyzer": "ik_max_word",
"fields": ["name","content"]
}
}
}
  • 关于分词器,我们在下文会做更详细的讨论。

Filter

Filter,译作过滤。

过滤查询

ELasticSearch中的查询分为两种。

  1. 查询(query):默认会计算每个返回文档的得分,然后根据得分排序
  2. 过滤(filter):只会筛选出符合的文档,并不计算得分。

所以,单从性能考虑,过滤比查询更快`。

过滤语法

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET /ems/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": {
"term": {
"age": 32
}
}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "WBamj34BCoX_YrYqZTh2",
"_score" : 1.0,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
}
}
]
}
}
  • 在执行filterquery时,先执行filter,后执行query
  • 常见的过滤器类型有:
    • term
    • terms
    • ranage
    • exists:过滤存在指定字段,且字段不为空的index。

我们举一个exists的例子。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
GET /ems/_doc/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"name": {
"value": "中国"
}
}
}
],
"filter": {
"exists": {
"field": "haha"
}
}
}
}
}

讲了这么多查询?那么接下来,应该是关联查询了吧。
没有关联查询。
虽然ElasticSearch支持join,但是官方不建议我们这么做,因为性能极差。
如果一定要做join,应该从程序或者建宽表的角度处理。

IK分词器

ElasticSearch采取的默认分词器是标准分词器,该分词器对于中文是单字分词。

我们可以采用IK分词器
Github地址为:https://github.com/medcl/elasticsearch-analysis-ik

在线安装

  1. 在bin目录中执行如下命令,进行安装。
    1
    ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.23/elasticsearch-analysis-ik-6.8.23.zip
  2. 重启生效

有一些资料说,安装分词器之后,需要把ElasticSearch中的历史索引数据删除,即删除ElasticSearch安装目录中的data文件夹。

实际测试,其实完全不需要!

而且,安装一个分词器,就需要删除历史索引数据?
ElasticSearch中可能有几百G,甚至1T的数据。安装一个分词器,就要把数据删了,重新导入?
不至于吧。

最后,我们论述一下elasticsearch-plugin的相关命令。

  • list:Lists installed elasticsearch plugins
  • install:Install a plugin
  • remove:removes a plugin from Elasticsearch

需要注意的是,在线安装的IK配置文件为

1
{ElasticSearcg安装目录}/config/analysis-ik/IKAnalyzer.cfg.xml
  • 与本地安装IK的配置文件地址不同。

本地安装IK

还可以进行本地安装。

  1. 将IK分词器传输至服务器。
  2. 解压。
    1
    unzip elasticsearch-analysis-ik-6.8.23.zip
    • 如果提示没有的的话,先安装unzip。命令如下:
    1
    yum install -y unzip
  3. 移动至plugins文件夹
    1
    2
    cd plugins/
    cp -r ~/elasticsearch-analysis-ik-6.8.23 ./
  4. 重启生效。

需要注意的是,本地安装的IK配置文件为

1
{ElasticSearch安装目录中}/plugins/analysis-ik/config/IKAnalyzer.cfg.xml
  • 与在线安装IK的配置文件地址不同。

测试IK分词器

IK分词器提供了两种分词方法:

  1. ik_max_word: 会将文本做最细粒度的拆分。
  2. ik_smart: 会做粗粒度的拆分。

我们直接看例子。
示例代码:

1
2
3
4
5
GET /_analyze
{
"text": ["中华人民共和国国歌"],
"analyzer": "ik_max_word"
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "中华人民",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "中华",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "华人",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "人民共和国",
"start_offset" : 2,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "人民",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "共和国",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "共和",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "国",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 8
},
{
"token" : "国歌",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 9
}
]
}

示例代码:

1
2
3
4
5
GET /_analyze
{
"text": ["中华人民共和国国歌"],
"analyzer": "ik_smart"
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "国歌",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 1
}
]
}

创建index指定分词器

我们可以利用analyzersearch_analyzer,在创建index的时候指定分词器。

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
PUT /ems
{
"mappings":{
"_doc":{
"properties":{
"name":{
"type":"text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"age":{
"type":"integer"
},
"bir":{
"type":"date"
},
"content":{
"type":"text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"address":{
"type":"keyword"
}
}
}
}
}
  • namecontent,指定了分词器。
1
2
3
4
5
6
7
8
9
10
11
12
13
PUT /ems/_doc/_bulk
{"index":{}}
{"name":"亨利","age":32,"bir":"2012-12-12","content":"当时光的列车缓缓驶过酋长球场","address":"糖果盒"}
{"index":{}}
{"name":"范德萨","age":24,"bir":"2012-12-12","content":"再见,范德萨,不老的传说,曼联有你,一生有你。","address":"上海"}
{"index":{}}
{"name":"皮尔洛","age":8,"bir":"2012-12-12","content":"从你含泪向队友告别的那一刻起,红黑色的21号将不再是我们熟悉的身影","address":"北京"}
{"index":{}}
{"name":"卡洛斯","age":9,"bir":"2012-12-12","content":"卡洛斯把自己的金色岁月留在了伯纳乌,而伯纳乌也给卡洛斯留下了不可抹去的金色记忆。","address":"南京"}
{"index":{}}
{"name":"罗纳尔多","age":43,"bir":"2012-12-12","content":"世上只有一个罗纳尔多!","address":"杭州"}
{"index":{}}
{"name":"卡卡","age":59,"bir":"2012-12-12","content":"天空,寄托着我的信仰。张开双臂,仰望天空,是对上天恩赐的感激。","address":"北京"}

试一下。
示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET /ems/_doc/_search
{
"query":{
"term":{
"content":"时光"
}
},
"highlight": {
"pre_tags": ["<span style='color:red'>"],
"post_tags": ["</span>"],
"fields": {
"*":{}
}
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "ems",
"_type" : "_doc",
"_id" : "WBamj34BCoX_YrYqZTh2",
"_score" : 0.2876821,
"_source" : {
"name" : "亨利",
"age" : 32,
"bir" : "2012-12-12",
"content" : "当时光的列车缓缓驶过酋长球场",
"address" : "糖果盒"
},
"highlight" : {
"content" : [
"当<span style='color:red'>时光</span>的列车缓缓驶过酋长球场"
]
}
}
]
}
}

配置扩展词

IK支持自定义扩展词典停用词典

  • 扩展词典:希望添加进词典的词
  • 停用词典:希望从词典中移除的词

修改IKAnalyzer.cfg.xml,即可添加扩展词。

1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

解释说明:

  • ext_dict:本地扩展词
  • ext_stopwords:本地停用词
  • remote_ext_dict:远程扩展词
  • remote_ext_stopwords:远程停用词
文章作者: Kaka Wan Yifan
文章链接: https://kakawanyifan.com/11202
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Kaka Wan Yifan

留言板