小言_互联网的博客

ES的基本使用

363人阅读  评论(0)

1、简介

elasticSearch【分布式开源搜索与分析引擎,适用于所有类型的数据,包括文本,数字,地理空间,结构化和非结构化数据, 秒级从海量数据从检索出我们所需要的数据,而mysql单表如果达到了百万级数据,检索很慢】

用途:

1、应用程序搜索

2、网站搜索

3、企业搜索

4、日志处理和分析

5、基础设施指标和容器检测

6、应用程序性能检测

7、地理空间数据分析和可视化

8、安全分析和业务分析 mysql【数据的持久化管理curd】

2、基本概念

1、Index(索引),相当于mysql的insert操作,插入(索引)一条数据到数据库;名词形式相当于mysql的database; 2、Type(类型),在索引中,可以定义一个或者多个类型,类似于mysql的table,每一种类型的数据放到一起; 3、Document(文档),保存在某个索引(Index)下,某种类型(Type)的一个数据(Document),文档是Json格式的,Document就像是Mysql中的某个table里边的内容; ES集群 google microsoft megacorp 索引 employee product employee product 类型 {id:1,name:张三} 文档,一条记录 {id:2,name:李四} id被称为属性,==列 ES的概念2:倒排索引;Mysql中保存一条数据,可能是正向索引; 在ES中,会维护一张倒排索引表; 词 记录位置 红海 1,2,3,4,5 行动 1,2,3 特工 5

会有一个相关性得分:3号记录命中了两个单词;

3、基本环境搭建


  
  1. docker pull elasticsearch
  2. docker pull libana ##可视化检索数据
  3. mkdir -p /mydata/elasticsearch/config #挂载到外部,方便修改
  4. mkdir -p /mydata/elasticsearch/data #挂载到外部,方便修改
  5. echo "http.host:0.0.0.0">> /mydata/elasticsearch/config/elasticsearch.yml #标识ES的机器,可以被远程任何机器访问,写入到yml里边
  6. cat elasticsearch.yml
  7. docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ #9200是restAPI的时候给9200发送请求,9300是ES在分布式集群下节点之间的通信端口
  8. -e "discovery.type = single-node" \ #单节点模式运行
  9. -e ES_JAVA_OPTS = "-Xms64m -Xmx128m" \ #真正上线之后,内存都是32G左右的
  10. -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ #这是进行挂载配置文件
  11. -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ #这是挂载数据
  12. -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ #这是挂载插件目录
  13. -d elasticsearch:latest
  14. #查看镜像版本:
  15. docker image inspect nginx:latest | grep -i version
  16. 生成elasticsearch容器[完整,有时行,有时不行,看内存好像,elasticsearch,默认好像占用全部内存,不要脸。]:
  17. docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
  18. -e "discovery.type=single-node" \
  19. -e ES_JAVA_OPTS= "-Xms512m -Xmx1024m" \
  20. -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml \
  21. -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
  22. -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
  23. -d elasticsearch:latest
  24. 生成之后,容器2秒后死亡,扩大运行内存试试??【不管用,那就多分配点资源,不设置最小内存了】
  25. docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
  26. -e "discovery.type=single-node" \
  27. -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml \
  28. -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
  29. -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
  30. -d elasticsearch
  31. free -m 查看使用的虚拟机内存
  32. 启动Kibana
  33. ##docker run --name kibana -e ELASTICSEARCH_HOSTS=192.168.52.130:9200 -p 5601:5601 -d kibana:latest
  34. docker run --name kibana -e ELASTICSEARCH_URL=http://192.168.52.130:9200 -p 5601:5601 -d kibana:latest
  35. http://192.168.52.130:5601/app/kibana #?_g=()
  36. /elastic/elasticsearch/master/docs/src/test/resources/accounts.json
  37. --删除容器:
  38. docker stop elasticsearch
  39. docker rm elasticsearch
  40. 查看错误日志:
  41. docker logs elasticsearch
  42. --挂载的文件需要有读写权限:任何用户任何组都有执行权限;
  43. chmod -R 777 /mydata/elasticsearch
  44. 执行前:
  45. ll
  46. drwxr-xr-x
  47. 执行后:
  48. ll
  49. drwxrwxrwx
  50. elasticsearch的镜像启动成功之后,访问下:
  51. http://192.168.52.130:9200/
  52. {
  53. "name" : "ZeFr4rR",
  54. "cluster_name" : "elasticsearch",
  55. "cluster_uuid" : "Hg9BNYkUQCy650XP26sYkw",
  56. "version" : {
  57. "number" : "5.6.12",
  58. "build_hash" : "cfe3d9f",
  59. "build_date" : "2018-09-10T20:12:43.732Z",
  60. "build_snapshot" : false,
  61. "lucene_version" : "6.6.1"
  62. },
  63. "tagline" : "You Know, for Search"
  64. }

相关DSL的语句使用


  
  1. es查询节点相关信息
  2. http://192.168.52.130:9200/_cat
  3. =^.^=
  4. /_cat/allocation
  5. /_cat/shards
  6. /_cat/shards/{index}
  7. /_cat/master
  8. /_cat/nodes
  9. /_cat/tasks
  10. /_cat/indices
  11. /_cat/indices/{index}
  12. /_cat/segments
  13. /_cat/segments/{index}
  14. /_cat/count
  15. /_cat/count/{index}
  16. /_cat/recovery
  17. /_cat/recovery/{index}
  18. /_cat/health
  19. /_cat/pending_tasks
  20. /_cat/aliases
  21. /_cat/aliases/{ alias}
  22. /_cat/thread_pool
  23. /_cat/thread_pool/{thread_pools}
  24. /_cat/plugins
  25. /_cat/fielddata
  26. /_cat/fielddata/{fields}
  27. /_cat/nodeattrs
  28. /_cat/repositories
  29. /_cat/snapshots/{repository}
  30. /_cat/templates
  31. http://192.168.52.130:9200/_cat/nodes
  32. 127.0.0.1 12 62 0 0.01 0.07 0.12 mdi * ZeFr4rR 【在集群状态下,哪个节点标了*,说明他是主节点】
  33. http://192.168.52.130:9200/_cat/health --查看es的健康状况
  34. 1647341114 10:45:14 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0% 【中间的一些数字是集群分片信息】
  35. http://192.168.52.130:9200/_cat/master --查看主节点
  36. ZeFr4rRZQz-w3ZK80Vuv0Q 127.0.0.1 127.0.0.1 ZeFr4rR
  37. http://192.168.52.130:9200/_cat/indices --查看所有索引 show databases
  38. 2、索引一个文档(保存)
  39. 保存一个数据,保存在那个索引类型下,指定用哪个唯一索引
  40. 在customer索引下的external类型下保存1号数据为
  41. PUT请求方式,必须得带上 id
  42. http://192.168.52.130:9200/customer/external/1
  43. body里边的json
  44. {
  45. "name": "pansd"
  46. }
  47. 返回信息:
  48. {
  49. "_index": "customer",
  50. "_type": "external",
  51. "_id": "1",
  52. "_version": 1, +1
  53. "result": "created", --updated,
  54. "_shards": { --分片,集群环境下介绍;
  55. "total": 2,
  56. "successful": 1,
  57. "failed": 0
  58. },
  59. "created": true
  60. }
  61. 分析返回数据:带_的都是元数据。put请求,第一次是请求时创建操作,以后的请求是更新操作。
  62. 如果是post请求方式,不指定 id
  63. {
  64. "_index": "customer",
  65. "_type": "external",
  66. "_id": "AX-NOVshrEj9CP_oy5eV", --自动生成 id,多次发送请求,每次都会产生唯一的 id
  67. "_version": 1,
  68. "result": "created",
  69. "_shards": {
  70. "total": 2,
  71. "successful": 1,
  72. "failed": 0
  73. },
  74. "created": true
  75. }
  76. --如果带上了 id,就和put一样了。
  77. es的查询:
  78. Get方式: customer/external/1
  79. http://192.168.52.130:9200/customer/external/1
  80. {
  81. "_index": "customer",
  82. "_type": "external",
  83. "_id": "1",
  84. "_version": 1,
  85. "_seq_no":1, -- 此版本的镜像没有,每次更新就会+1,用来做乐观锁。
  86. "_primary_term":1, -- 此版本的镜像没有,主分片重新分配,如果重启,就会发生变化。
  87. "found": true,
  88. "_source": {
  89. "name": "pansd"
  90. }
  91. }
  92. 并发更新es的的时候,要带上乐观锁,进行并发控制。或者带上版本号,进行并发控制。
  93. 比如:
  94. put请求:http://192.168.52.130:9200/customer/external/1?if_seq_no=1&if_primary_term=1
  95. 使用update更新:
  96. post请求方式:http://192.168.52.130:9200/customer/external/1_update
  97. {
  98. "doc":{
  99. "name": "pshdhx"
  100. }
  101. }
  102. 1、第一次更新,版本号会改变;若在发送相同的请求,版本号和锁序列号不会改变了。【简言之:update会对比原来的数据】
  103. 删除文档索引:
  104. delete customer/external/1
  105. {
  106. "found": true,
  107. "_index": "customer",
  108. "_type": "external",
  109. "_id": "1",
  110. "_version": 2,
  111. "result": "deleted",
  112. "_shards": {
  113. "total": 2,
  114. "successful": 1,
  115. "failed": 0
  116. }
  117. }
  118. 再搜索:
  119. {
  120. "_index": "customer",
  121. "_type": "external",
  122. "_id": "1",
  123. "found": false
  124. }
  125. delete customer 【删除索引,但是es中是不能单独删除类型的;或者是把索引下所有的数据都给清空了,这样也相当于是删除了类型。】
  126. 批量导入API
  127. 此时批量操作json格式不对,使用text的换行又不对了,所以得使用kibana了。
  128. 1、首先必须post请求
  129. 2、customer/external/_bulk
  130. action
  131. body
  132. 具体展示为:每两行一个操作;批量插入两行数据;在kibana中的dev tools中添加如下代码;
  133. post /customer/external/_bulk
  134. { "index":{ "_id": "1"}}
  135. { "name": "pansd"}
  136. { "index":{ "_id": "2"}}
  137. { "name": "pshdhx"}
  138. 返回值为:
  139. {
  140. "took": 62,
  141. "errors": false,
  142. "items": [
  143. {
  144. "index": {
  145. "_index": "customer",
  146. "_type": "external",
  147. "_id": "1",
  148. "_version": 1,
  149. "result": "created",
  150. "_shards": {
  151. "total": 2,
  152. "successful": 1,
  153. "failed": 0
  154. },
  155. "created": true,
  156. "status": 201
  157. }
  158. },
  159. {
  160. "index": {
  161. "_index": "customer",
  162. "_type": "external",
  163. "_id": "2",
  164. "_version": 1,
  165. "result": "created",
  166. "_shards": {
  167. "total": 2,
  168. "successful": 1,
  169. "failed": 0
  170. },
  171. "created": true,
  172. "status": 201
  173. }
  174. }
  175. ]
  176. }

复杂DSL语句


  
  1. 复杂操作:
  2. post /_bulk
  3. { "delete":{ "_index": "website", "_type": "blog", "_id": "123"}}
  4. { "create":{ "_index": "website", "_type": "blog", "_id": "123"}}
  5. { "title": "my first blog"}
  6. { "index":{ "_index": "website", "_type": "blog"}}
  7. { "title": "my second blog"}
  8. { "update":{ "_index": "website", "_type": "blog", "_id": "123"}}
  9. { "doc":{ "title": "my updated blog post"}}
  10. 返回了:
  11. {
  12. "took": 24,
  13. "errors": false,
  14. "items": [
  15. {
  16. "delete": {
  17. "found": true,
  18. "_index": "website",
  19. "_type": "blog",
  20. "_id": "123",
  21. "_version": 6,
  22. "result": "deleted",
  23. "_shards": {
  24. "total": 2,
  25. "successful": 1,
  26. "failed": 0
  27. },
  28. "status": 200
  29. }
  30. },
  31. {
  32. "create": {
  33. "_index": "website",
  34. "_type": "blog",
  35. "_id": "123",
  36. "_version": 7,
  37. "result": "created",
  38. "_shards": {
  39. "total": 2,
  40. "successful": 1,
  41. "failed": 0
  42. },
  43. "created": true,
  44. "status": 201
  45. }
  46. },
  47. {
  48. "index": {
  49. "_index": "website",
  50. "_type": "blog",
  51. "_id": "AX-OHvjg4kOgKKwsJNF9",
  52. "_version": 1,
  53. "result": "created",
  54. "_shards": {
  55. "total": 2,
  56. "successful": 1,
  57. "failed": 0
  58. },
  59. "created": true,
  60. "status": 201
  61. }
  62. },
  63. {
  64. "update": {
  65. "_index": "website",
  66. "_type": "blog",
  67. "_id": "123",
  68. "_version": 8,
  69. "result": "updated",
  70. "_shards": {
  71. "total": 2,
  72. "successful": 1,
  73. "failed": 0
  74. },
  75. "status": 200
  76. }
  77. }
  78. ]
  79. }
  80. 批量新增的数据地址:https://github.com/elastic/elasticsearch/blob/5.0/docs/src/test/resources/accounts.json
  81. post /bank/account/_bulk
  82. .....
  83. 新增成功后:使用高级检索:
  84. docker update ... --restart=always
  85. https://www.elastic.co/guide/index.html 参照官方文档:
  86. https://www.elastic.co/guide/en/elastic-stack-get-started/7.5/index.html
  87. https://www.elastic.co/guide/en/elasticsearch/reference/7.5/getting-started-search.html
  88. GET bank/_search?q=*& sort=account_number:asc
  89. 该响应还提供有关搜索请求的以下信息:
  90. took– Elasticsearch 运行查询需要多长时间,以毫秒为单位
  91. timed_out– 搜索请求是否超时
  92. _shards– 搜索了多少分片,以及多少分片成功、失败或被跳过的细分。
  93. max_score– 找到的最相关文档的分数
  94. hits.total.value- 找到了多少匹配的文档
  95. hits.sort- 文档的排序位置(不按相关性分数排序时)
  96. hits._score- 文档的相关性分数(使用时不适用match_all)
  97. total ": 1000,但是实际上hits里边只有10条,这是分页的结果。
  98. ##match_all的查询;全文检索
  99. GET /bank/_search
  100. {
  101. "query ": { "match_all ": {} },
  102. " sort ": [
  103. { "account_number ":{
  104. "order ": "desc "
  105. }
  106. }
  107. ],
  108. "from ": 0,
  109. "size ": 20,
  110. "_source ": ["balance ","firstname "]
  111. }
  112. ##match查询;按条件模糊查询
  113. GET /bank/_search
  114. {
  115. "query ": {
  116. "match ": {
  117. "address ": "mill lane " ##凡是有mill或者是lane的,都返回
  118. }
  119. }
  120. }
  121. ##只返回这样的mill lane;短语查询;
  122. GET /bank/_search
  123. {
  124. "query ": { "match_phrase ": { "address ": "mill lane " } }
  125. }
  126. must和must_not;满足should的得分高;
  127. GET /bank/_search
  128. {
  129. "query ": {
  130. "bool ": {
  131. "must ": [
  132. { "match ": { "age ": "40 " } }
  133. ],
  134. "must_not ": [
  135. { "match ": { "state ": "ID " } }
  136. ],
  137. "should ": [
  138. {
  139. "match ": {
  140. "address ": "MILL "
  141. }
  142. }
  143. ]
  144. }
  145. }
  146. }
  147. ##filter不会计算相关性得分。must会计算相关性得分。
  148. GET /bank/_search
  149. {
  150. "query ": {
  151. "bool ": {
  152. "must ": { "match_all ": {} },
  153. "filter ": {
  154. "range ": {
  155. "balance ": {
  156. "gte ": 20000,
  157. "lte ": 30000
  158. }
  159. }
  160. }
  161. }
  162. }
  163. }
  164. ##多字段匹配
  165. get bank/_search
  166. {
  167. "query ":{
  168. "multi_match ": {
  169. "query ": "mill ",
  170. "fields ": ["state ","address "]
  171. }
  172. }
  173. }
  174. 使用term检索精确字段的值;与match正好相反(match适用于全文检索)
  175. get bank/_search
  176. {
  177. "query ":{
  178. "term ": {
  179. "age ": {
  180. "age ": "20 "
  181. }
  182. }
  183. }
  184. }
  185. ##查询精确值 这是精确匹配
  186. get bank/_search
  187. {
  188. "query ":{
  189. "match ": {
  190. "address.keyword ": "MILL "
  191. }
  192. }
  193. }

聚合函数


  
  1. 众所周知:elasticsearch是一个搜索与分析的引擎,下面是它的分析的用法。
  2. Analyze results with aggregations
  3. 搜索address中包含mill的所有人的年龄分布以及平均年龄;
  4. get bank/_search
  5. {
  6. "query":{
  7. "match":{
  8. "address": "mill"
  9. }
  10. },
  11. "aggs":{
  12. "ageAgg":{
  13. "terms": {
  14. "field": "age",
  15. "size": 10
  16. }
  17. },
  18. "ageAvg":{
  19. "avg": {
  20. "field": "age"
  21. }
  22. },
  23. "balanceAvg":{
  24. "avg": {
  25. "field": "balance"
  26. }
  27. }
  28. },
  29. ##不显示hits的结果,只显示个数。
  30. "size": 0
  31. }
  32. "aggregations": {
  33. "ageAgg": {
  34. "doc_count_error_upper_bound": 0,
  35. "sum_other_doc_count": 0,
  36. "buckets": [
  37. {
  38. "key": 38,
  39. "doc_count": 2
  40. },
  41. {
  42. "key": 28,
  43. "doc_count": 1
  44. },
  45. {
  46. "key": 32,
  47. "doc_count": 1
  48. }
  49. ]
  50. }
  51. }
  52. ##按照年龄进行聚合,并且请求这些年龄的这些人的平均薪资
  53. get bank/_search
  54. {
  55. "query":{
  56. "match_all": {}
  57. },
  58. "aggs":{
  59. "ageAgg":{
  60. "terms": {
  61. "field": "age",
  62. "size": 10
  63. },
  64. "aggs":{
  65. "balanceAvg":{
  66. "avg": {
  67. "field": "balance"
  68. }
  69. }
  70. }
  71. }
  72. }
  73. }
  74. 返回值:
  75. "aggregations": {
  76. "ageAgg": {
  77. "doc_count_error_upper_bound": 0,
  78. "sum_other_doc_count": 463,
  79. "buckets": [
  80. {
  81. "key": 31,
  82. "doc_count": 61,
  83. "balanceAvg": {
  84. "value": 28312.918032786885
  85. }
  86. },
  87. {
  88. "key": 39,
  89. "doc_count": 60,
  90. "balanceAvg": {
  91. "value": 25269.583333333332
  92. }
  93. },
  94. ##查出所有年龄分布,并且这些年龄段中男性的平均薪资和女性的平均薪资,还有总体的平均薪资
  95. get bank/_search
  96. {
  97. "query":{
  98. "match_all": {}
  99. },
  100. "aggs":{
  101. "ageAgg":{
  102. "terms": {
  103. "field": "age",
  104. "size": 10
  105. },
  106. "aggs": {
  107. "genderAgg": {
  108. "terms": {
  109. "field": "gender.keyword",
  110. "size": 10
  111. },
  112. "aggs": {
  113. "balanceAgg": {
  114. "avg": {
  115. "field": "balance"
  116. }
  117. }
  118. }
  119. },
  120. "balanceAvgAll":{
  121. "avg": {
  122. "field": "balance"
  123. }
  124. }
  125. }
  126. }
  127. }
  128. }
  129. 返回值:
  130. "buckets": [ 31岁的人 有 61个; 35个男的, 26个女的; 男的的平均薪资是 29565.628571428573 女性的平均薪资是 26626.576923076922
  131. {
  132. "key": 31,
  133. "doc_count": 61,
  134. "genderAgg": {
  135. "doc_count_error_upper_bound": 0,
  136. "sum_other_doc_count": 0,
  137. "buckets": [
  138. {
  139. "key": "M",
  140. "doc_count": 35,
  141. "balanceAgg": {
  142. "value": 29565.628571428573
  143. }
  144. },
  145. {
  146. "key": "F",
  147. "doc_count": 26,
  148. "balanceAgg": {
  149. "value": 26626.576923076922
  150. }
  151. }
  152. ]
  153. },
  154. "balanceAvgAll": {
  155. "value": 28312.918032786885
  156. }
  157. },

mapping映射

elasticsearch的mapping方式:7.x之后,去除了type;预计8.x取消type的支持。为了防止不同的type的字段名称相同。


  
  1. get bank/_mapping
  2. 关于ES中的数据类型:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping-types.html
  3. string
  4. text and keyword
  5. Numeric
  6. long, integer, short, byte, double, float, half_float, scaled_float
  7. Date
  8. date
  9. Date nanoseconds
  10. date_nanos
  11. Boolean
  12. boolean
  13. Binary
  14. binary
  15. Range
  16. integer_range, float_range, long_range, double_range, date_range
  17. 还有复杂数据类型。
  18. 关于映射:很恶心,不同的版本不一样,看对应版本的文档都对应不起来。
  19. get bank/_mapping
  20. PUT my_index
  21. {
  22. "mappings": {
  23. "user": {
  24. "_all": {
  25. "enabled": false
  26. },
  27. "properties": {
  28. "title": {
  29. "type": "text"
  30. },
  31. "name": {
  32. "type": "text"
  33. },
  34. "age": {
  35. "type": "integer"
  36. }
  37. }
  38. }
  39. }
  40. }
  41. PUT twitter
  42. {
  43. "mappings": {
  44. "user": {
  45. "properties": {
  46. "name": { "type": "text" },
  47. "user_name": { "type": "keyword" },
  48. "email": { "type": "keyword" }
  49. }
  50. },
  51. "tweet": {
  52. "properties": {
  53. "content": { "type": "text" },
  54. "user_name": { "type": "keyword" },
  55. "tweeted_at": { "type": "date" }
  56. }
  57. }
  58. }
  59. }
  60. PUT twitter/user/kimchy
  61. {
  62. "name": "Shay Banon",
  63. "user_name": "kimchy",
  64. "email": "shay@kimchy.com"
  65. }
  66. PUT twitter/tweet/1
  67. {
  68. "user_name": "kimchy",
  69. "tweeted_at": "2017-10-24T09:00:00Z",
  70. "content": "Types are going away"
  71. }
  72. GET twitter/tweet/_search
  73. {
  74. "query": {
  75. "match": {
  76. "user_name": "kimchy"
  77. }
  78. }
  79. }
  80. PUT twitter
  81. {
  82. "mappings": {
  83. "doc": {
  84. "properties": {
  85. "type": { "type": "keyword" },
  86. "name": { "type": "text" },
  87. "user_name": { "type": "keyword" },
  88. "email": { "type": "keyword" },
  89. "content": { "type": "text" },
  90. "tweeted_at": { "type": "date" }
  91. }
  92. }
  93. }
  94. }

 ES的映射修改和数据迁移


  
  1. ##创建一个索引,并指定映射规则
  2. PUT / my_index
  3. {
  4. "mappings": {
  5. "properties": {
  6. "age":{ "type": "integer"},
  7. "email":{ "type": "keyword"},
  8. "name":{ "type": "text"}
  9. }
  10. }
  11. }
  12. ##修改索引下的映射规则,【新增字段】
  13. PUT /my_index/_ mapping
  14. {
  15. "properties":{
  16. "employee-id":{ "type": "keyword", "index": false}
  17. }
  18. }
  19. GET /my_index/_mapping
  20. ##对于已经存在的映射字段,我们是不能更新的,只能添加;因为如果修改映射,那么可能存在已有的数据;我们
  21. 可以创建一个新的索引,重新reindex原来的数据。【数据迁移】
  22. ##创建新的索引,来改变老的索引的映射类型
  23. PUT / newBank
  24. {
  25. "mappings": {
  26. "properties": {
  27. }
  28. }
  29. }
  30. ##数据迁移
  31. POST _ reindex
  32. {
  33. "source": {
  34. "index": "bank",
  35. "type": "account"
  36. },
  37. "dest": {
  38. "index": "newBank"
  39. }
  40. }

4、ES的分词器

关于ES的分词:
一个tokenizer(分词器)接收一个字符流,将之分割为独立的tokens(词元,通常是独立的单词),然后输出token流。
例如:whitespace tokenizer遇到空格字符时分割文本。它会将文本"Quick brown fox"分割为 【quick,brown,fox】
该tokenizer(分词器)还负责记录各个term(词条)的顺序或者是position位置(用于phrase短语和word proximity词近邻查询),以及term词条所
代表的原始word的start和end的字符偏移量。(用于高亮显示搜索的内容)
ES提供了很多内置的分词器,可以用来构建custom analyzers(自定义分词器)

4.1、安装ik分词器:

ES的分词器:7.4.2
链接:https://pan.baidu.com/s/1DpWUZEFicNiYOJDCSDidVA 
提取码:pshd 
--来自百度网盘超级会员V2的分享


  
  1. POST _analyze
  2. {
  3. "analyzer": "whitespace",
  4. "text": "The quick brown fox."
  5. }
  6. 安装下中文分词器:ik;github到plugins;
  7. 进入容器内部:
  8. docker exec -it elasticsearch /bin/bash
  9. pwd
  10. ls
  11. exit退出容器
  12. 先下载wget :用yum下载下载器;
  13. yum install wget
  14. 设置root登录:
  15. vi /etc/ssh/sshd_config
  16. 修改:passwordAuthentication yes/no
  17. 重启服务:service sshd restart
  18. 1、必须安装ES对应版本的IK分词器;
  19. 2、放入的plugins目录下;
  20. 3、重启容器;
  21. 4、测试分词:
  22. POST _analyze
  23. {
  24. "analyzer": "ik_smart", ## ik_max_word
  25. "text": "尚硅谷电商项目"
  26. }
  27. 返回值:
  28. {
  29. "tokens": [
  30. {
  31. "token": "尚",
  32. "start_offset": 0,
  33. "end_offset": 1,
  34. "type": "CN_CHAR",
  35. "position": 0
  36. },
  37. {
  38. "token": "硅谷",
  39. "start_offset": 1,
  40. "end_offset": 3,
  41. "type": "CN_WORD",
  42. "position": 1
  43. },
  44. {
  45. "token": "电",
  46. "start_offset": 3,
  47. "end_offset": 4,
  48. "type": "CN_CHAR",
  49. "position": 2
  50. },
  51. {
  52. "token": "商",
  53. "start_offset": 4,
  54. "end_offset": 5,
  55. "type": "CN_CHAR",
  56. "position": 3
  57. },
  58. {
  59. "token": "项目",
  60. "start_offset": 5,
  61. "end_offset": 7,
  62. "type": "CN_WORD",
  63. "position": 4
  64. }
  65. ]
  66. }
  67. 有些词语分词器不识别,需要创建自定义分词:
  68. 1、先设置网卡:
  69. 添加GATEWAY=xxxxx.1
  70. DNS1=114.114.114.114
  71. DNS2=8.8.8.8
  72. 使用了match之后,会有一个max_score最大得分;查询数字是精确查询;查询字符串是模糊查询【全文检索】;
  73. 关于自定义分词器:

4.2、安装nginx


  
  1. 安装nginx,把词库的地址放到nginx里边;
  2. docker container cp nginx:/etc/nginx .
  3. mv nginx conf #文件夹改名
  4. mv conf nginx/ #把conf挪移到nginx文件夹里边
  5. docker stop nginx;
  6. docker rm nginx;
  7. docker run -p 80:80 --name nginx \
  8. -v /mydata/nginx/conf/html:/usr/share/nginx/html \
  9. -v /mydata/nginx/conf/logs:/var/log/nginx \
  10. -v /mydata/nginx/conf:/etc/nginx \
  11. -d nginx:1.10
  12. docker run -p80:80 --name nginx -v/mydata/nginx/html:/usr/share/nginx/html -v/mydata/nginx/logs:/var/log/nginx -v /mydata/nginx/conf:/etc/nginx -d nginx:1.10 ##【正确】
  13. 在nginx里边自定义fenci.txt
  14. 配置ik分词器的词库地址http://192.168.52.130/es/fenci.txt

 

5、 java代码交互elasticsearch

5.1、pom.xml


  
  1. <elasticserach.version>7.4.2 </elasticserach.version>
  2. <dependency>
  3. <groupId>org.elasticsearch.client </groupId>
  4. <artifactId>elasticsearch-rest-high-level-client </artifactId>
  5. <version>7.4.2 </version>
  6. </dependency>

5.2、配置文件configuration


  
  1. package com.pshdhx.elasticsearch.config;
  2. import org.apache.http.HttpHost;
  3. import org.elasticsearch.client.RequestOptions;
  4. import org.elasticsearch.client.RestClient;
  5. import org.elasticsearch.client.RestClientBuilder;
  6. import org.elasticsearch.client.RestHighLevelClient;
  7. import org.springframework.context.annotation.Bean;
  8. import org.springframework.context.annotation.Configuration;
  9. /**
  10. * @author pshdhx
  11. * @date 2022-03-28 15:09
  12. */
  13. @Configuration
  14. public class ElasticSearchConfiguration {
  15. public static final RequestOptions COMMON_OPTIONS;
  16. static {
  17. RequestOptions. Builder builder = RequestOptions.DEFAULT.toBuilder();
  18. //addHeader();
  19. //setHttpAsyncResponseCounsumerFactory
  20. COMMON_OPTIONS = builder.build();
  21. }
  22. @Bean
  23. public RestHighLevelClient EsRestClient (){
  24. RestClientBuilder restClientBuilder = RestClient.builder( new HttpHost( "82.xx.xx.xxxx", 9200, "http"));
  25. RestHighLevelClient client = new RestHighLevelClient(restClientBuilder);
  26. return client;
  27. }
  28. }

5.3、测试文件 


  
  1. package com.pshdhx.elasticsearch;
  2. import com.alibaba.fastjson.JSON;
  3. import com.pshdhx.elasticsearch.config.ElasticSearchConfig;
  4. import lombok.AllArgsConstructor;
  5. import lombok.Data;
  6. import lombok.ToString;
  7. import org.elasticsearch.action.index.IndexRequest;
  8. import org.elasticsearch.action.search.SearchRequest;
  9. import org.elasticsearch.action.search.SearchResponse;
  10. import org.elasticsearch.client.RestHighLevelClient;
  11. import org.elasticsearch.common.xcontent.XContentType;
  12. import org.elasticsearch.index.query.QueryBuilder;
  13. import org.elasticsearch.index.query.QueryBuilders;
  14. import org.elasticsearch.search.SearchHit;
  15. import org.elasticsearch.search.SearchHits;
  16. import org.elasticsearch.search.aggregations.Aggregation;
  17. import org.elasticsearch.search.aggregations.AggregationBuilder;
  18. import org.elasticsearch.search.aggregations.AggregationBuilders;
  19. import org.elasticsearch.search.aggregations.Aggregations;
  20. import org.elasticsearch.search.aggregations.bucket.terms.Terms;
  21. import org.elasticsearch.search.aggregations.metrics.Avg;
  22. import org.elasticsearch.search.builder.SearchSourceBuilder;
  23. import org.junit.jupiter.api.Test;
  24. import org.springframework.beans.factory.annotation.Autowired;
  25. import org.springframework.boot.test.context.SpringBootTest;
  26. import javax.naming.directory.SearchResult;
  27. import java.io.IOException;
  28. import java.util.Map;
  29. @SpringBootTest
  30. class GuilimallEsApplicationTests {
  31. @Autowired
  32. private RestHighLevelClient restHighLevelClient;
  33. @Test
  34. void testSearchEs () throws IOException {
  35. //1、创建检索请求
  36. SearchRequest searchRequest = new SearchRequest();
  37. //指定索引
  38. searchRequest.indices( "bank");
  39. //指定DSL检索条件
  40. SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
  41. sourceBuilder.query(QueryBuilders.matchQuery( "address", "mill"));
  42. //测试聚合函数
  43. sourceBuilder.aggregation(AggregationBuilders.terms( "ageAgg").field( "age").size( 10));
  44. sourceBuilder.aggregation(AggregationBuilders.avg( "balanceAvg").field( "balance"));
  45. System.out.println(sourceBuilder.toString());
  46. searchRequest.source(sourceBuilder);
  47. //2、执行检索
  48. SearchResponse response = restHighLevelClient.search(searchRequest, ElasticSearchConfig.COMMON_OPTIONS);
  49. System.out.println(response.toString());
  50. Map map = JSON.parseObject(response.toString(), Map.class);
  51. SearchHits hits = response.getHits();
  52. SearchHit[] hits1 = hits.getHits();
  53. for (SearchHit hit : hits1) {
  54. String index = hit.getIndex();
  55. //Map<String, Object> sourceAsMap = hit.getSourceAsMap();
  56. String sourceAsString = hit.getSourceAsString();
  57. Account account = JSON.parseObject(sourceAsString, Account.class);
  58. System.out.println( "Account:"+account.toString());
  59. }
  60. //获取分析信息
  61. Aggregations aggregations = response.getAggregations();
  62. Terms ageAgg1 = aggregations.get( "ageAgg");
  63. for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
  64. String keyAsString = bucket.getKeyAsString();
  65. System.out.println( "年龄"+keyAsString+ "===》"+bucket.getDocCount());
  66. }
  67. Avg balanceAvg1 = aggregations.get( "balanceAvg");
  68. System.out.println( "平均薪资:"+balanceAvg1.getValue());
  69. }
  70. @Test
  71. void contextLoads () {
  72. System.out.println( "111");
  73. System.out.println(restHighLevelClient);
  74. }
  75. @Test
  76. public void indexData () throws IOException {
  77. IndexRequest indexRequest = new IndexRequest( "users");
  78. indexRequest.id( "1");
  79. // indexRequest.source("userName","zhangsan","age",18,"gender","男");
  80. User user = new User( "pshdhx", 18, "man");
  81. String jsonString = JSON.toJSONString(user);
  82. IndexRequest index = indexRequest.source(jsonString, XContentType.JSON);
  83. //执行操作
  84. restHighLevelClient.index(indexRequest, ElasticSearchConfig.COMMON_OPTIONS);
  85. System.out.println(index);
  86. }
  87. @Data
  88. @AllArgsConstructor
  89. class User{
  90. private String userName;
  91. private Integer age;
  92. private String gender;
  93. }
  94. @Data
  95. @ToString
  96. static class Account {
  97. private int account_number;
  98. private int balance;
  99. private String firstname;
  100. private String lastname;
  101. private int age;
  102. private String gender;
  103. private String address;
  104. private String employer;
  105. private String email;
  106. private String city;
  107. private String state;
  108. }
  109. }

操作结果


  
  1. { "query":{ "match":{ "address":{ "query": "mill", "operator": "OR", "prefix_length": 0, "max_expansions": 50, "fuzzy_transpositions":true, "lenient":false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query":true, "boost": 1.0}}}, "aggregations":{ "ageAgg":{ "terms":{ "field": "age", "size": 10, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error":false, "order":[{ "_count": "desc"},{ "_key": "asc"}]}}, "balanceAvg":{ "avg":{ "field": "balance"}}}}
  2. { "took": 1, "timed_out":false, "_shards":{ "total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits":{ "total":{ "value": 4, "relation": "eq"}, "max_score": 5.4032025, "hits":[{ "_index": "bank", "_type": "_doc", "_id": "970", "_score": 5.4032025, "_source":{ "account_number": 970, "balance": 19648, "firstname": "Forbes", "lastname": "Wallace", "age": 28, "gender": "M", "address": "990 Mill Road", "employer": "Pheast", "email": "forbeswallace@pheast.com", "city": "Lopezo", "state": "AK"}},{ "_index": "bank", "_type": "_doc", "_id": "136", "_score": 5.4032025, "_source":{ "account_number": 136, "balance": 45801, "firstname": "Winnie", "lastname": "Holland", "age": 38, "gender": "M", "address": "198 Mill Lane", "employer": "Neteria", "email": "winnieholland@neteria.com", "city": "Urie", "state": "IL"}},{ "_index": "bank", "_type": "_doc", "_id": "345", "_score": 5.4032025, "_source":{ "account_number": 345, "balance": 9812, "firstname": "Parker", "lastname": "Hines", "age": 38, "gender": "M", "address": "715 Mill Avenue", "employer": "Baluba", "email": "parkerhines@baluba.com", "city": "Blackgum", "state": "KY"}},{ "_index": "bank", "_type": "_doc", "_id": "472", "_score": 5.4032025, "_source":{ "account_number": 472, "balance": 25571, "firstname": "Lee", "lastname": "Long", "age": 32, "gender": "F", "address": "288 Mill Street", "employer": "Comverges", "email": "leelong@comverges.com", "city": "Movico", "state": "MT"}}]}, "aggregations":{ "lterms#ageAgg":{ "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets":[{ "key": 38, "doc_count": 2},{ "key": 28, "doc_count": 1},{ "key": 32, "doc_count": 1}]}, "avg#balanceAvg":{ "value": 25208.0}}}
  3. Account:GuilimallEsApplicationTests.Account(account_number= 970, balance= 19648, firstname=Forbes, lastname=Wallace, age= 28, gender=M, address= 990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
  4. Account:GuilimallEsApplicationTests.Account(account_number= 136, balance= 45801, firstname=Winnie, lastname=Holland, age= 38, gender=M, address= 198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
  5. Account:GuilimallEsApplicationTests.Account(account_number= 345, balance= 9812, firstname=Parker, lastname=Hines, age= 38, gender=M, address= 715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
  6. Account:GuilimallEsApplicationTests.Account(account_number= 472, balance= 25571, firstname=Lee, lastname=Long, age= 32, gender=F, address= 288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)
  7. 年龄 38===》 2
  8. 年龄 28===》 1
  9. 年龄 32===》 1
  10. 平均薪资: 25208.0


转载:https://blog.csdn.net/pshdhx/article/details/125712448
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场