1、简介
elasticSearch【分布式开源搜索与分析引擎,适用于所有类型的数据,包括文本,数字,地理空间,结构化和非结构化数据, 秒级从海量数据从检索出我们所需要的数据,而mysql单表如果达到了百万级数据,检索很慢】
用途:
1、应用程序搜索
2、网站搜索
3、企业搜索
4、日志处理和分析
5、基础设施指标和容器检测
6、应用程序性能检测
7、地理空间数据分析和可视化
8、安全分析和业务分析 mysql【数据的持久化管理curd】
2、基本概念
1、Index(索引),相当于mysql的insert操作,插入(索引)一条数据到数据库;名词形式相当于mysql的database; 2、Type(类型),在索引中,可以定义一个或者多个类型,类似于mysql的table,每一种类型的数据放到一起; 3、Document(文档),保存在某个索引(Index)下,某种类型(Type)的一个数据(Document),文档是Json格式的,Document就像是Mysql中的某个table里边的内容; ES集群 google microsoft megacorp 索引 employee product employee product 类型 {id:1,name:张三} 文档,一条记录 {id:2,name:李四} id被称为属性,==列 ES的概念2:倒排索引;Mysql中保存一条数据,可能是正向索引; 在ES中,会维护一张倒排索引表; 词 记录位置 红海 1,2,3,4,5 行动 1,2,3 特工 5
会有一个相关性得分:3号记录命中了两个单词;
3、基本环境搭建
-
docker pull elasticsearch
-
docker pull libana
##可视化检索数据
-
mkdir -p /mydata/elasticsearch/config
#挂载到外部,方便修改
-
mkdir -p /mydata/elasticsearch/data
#挂载到外部,方便修改
-
echo
"http.host:0.0.0.0">> /mydata/elasticsearch/config/elasticsearch.yml
#标识ES的机器,可以被远程任何机器访问,写入到yml里边
-
cat elasticsearch.yml
-
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
#9200是restAPI的时候给9200发送请求,9300是ES在分布式集群下节点之间的通信端口
-
-e
"discovery.type = single-node" \
#单节点模式运行
-
-e ES_JAVA_OPTS =
"-Xms64m -Xmx128m" \
#真正上线之后,内存都是32G左右的
-
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
#这是进行挂载配置文件
-
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
#这是挂载数据
-
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
#这是挂载插件目录
-
-d elasticsearch:latest
-
#查看镜像版本:
-
docker image inspect nginx:latest | grep -i version
-
-
生成elasticsearch容器[完整,有时行,有时不行,看内存好像,elasticsearch,默认好像占用全部内存,不要脸。]:
-
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-
-e
"discovery.type=single-node" \
-
-e ES_JAVA_OPTS=
"-Xms512m -Xmx1024m" \
-
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml \
-
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-
-d elasticsearch:latest
-
生成之后,容器2秒后死亡,扩大运行内存试试??【不管用,那就多分配点资源,不设置最小内存了】
-
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-
-e
"discovery.type=single-node" \
-
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml \
-
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-
-d elasticsearch
-
-
free -m 查看使用的虚拟机内存
-
-
-
-
启动Kibana
-
##docker run --name kibana -e ELASTICSEARCH_HOSTS=192.168.52.130:9200 -p 5601:5601 -d kibana:latest
-
docker run --name kibana -e ELASTICSEARCH_URL=http://192.168.52.130:9200 -p 5601:5601 -d kibana:latest
-
-
http://192.168.52.130:5601/app/kibana
#?_g=()
-
-
/elastic/elasticsearch/master/docs/src/test/resources/accounts.json
-
-
--删除容器:
-
docker stop elasticsearch
-
docker
rm elasticsearch
-
查看错误日志:
-
docker logs elasticsearch
-
--挂载的文件需要有读写权限:任何用户任何组都有执行权限;
-
chmod -R 777 /mydata/elasticsearch
-
执行前:
-
ll
-
drwxr-xr-x
-
执行后:
-
ll
-
drwxrwxrwx
-
elasticsearch的镜像启动成功之后,访问下:
-
http://192.168.52.130:9200/
-
{
-
"name" :
"ZeFr4rR",
-
"cluster_name" :
"elasticsearch",
-
"cluster_uuid" :
"Hg9BNYkUQCy650XP26sYkw",
-
"version" : {
-
"number" :
"5.6.12",
-
"build_hash" :
"cfe3d9f",
-
"build_date" :
"2018-09-10T20:12:43.732Z",
-
"build_snapshot" :
false,
-
"lucene_version" :
"6.6.1"
-
},
-
"tagline" :
"You Know, for Search"
-
}
相关DSL的语句使用
-
es查询节点相关信息
-
http://192.168.52.130:9200/_cat
-
=^.^=
-
/_cat/allocation
-
/_cat/shards
-
/_cat/shards/{index}
-
/_cat/master
-
/_cat/nodes
-
/_cat/tasks
-
/_cat/indices
-
/_cat/indices/{index}
-
/_cat/segments
-
/_cat/segments/{index}
-
/_cat/count
-
/_cat/count/{index}
-
/_cat/recovery
-
/_cat/recovery/{index}
-
/_cat/health
-
/_cat/pending_tasks
-
/_cat/aliases
-
/_cat/aliases/{
alias}
-
/_cat/thread_pool
-
/_cat/thread_pool/{thread_pools}
-
/_cat/plugins
-
/_cat/fielddata
-
/_cat/fielddata/{fields}
-
/_cat/nodeattrs
-
/_cat/repositories
-
/_cat/snapshots/{repository}
-
/_cat/templates
-
-
http://192.168.52.130:9200/_cat/nodes
-
127.0.0.1 12 62 0 0.01 0.07 0.12 mdi * ZeFr4rR 【在集群状态下,哪个节点标了*,说明他是主节点】
-
http://192.168.52.130:9200/_cat/health --查看es的健康状况
-
1647341114 10:45:14 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0% 【中间的一些数字是集群分片信息】
-
http://192.168.52.130:9200/_cat/master --查看主节点
-
ZeFr4rRZQz-w3ZK80Vuv0Q 127.0.0.1 127.0.0.1 ZeFr4rR
-
http://192.168.52.130:9200/_cat/indices --查看所有索引 show databases
-
2、索引一个文档(保存)
-
保存一个数据,保存在那个索引类型下,指定用哪个唯一索引
-
在customer索引下的external类型下保存1号数据为
-
PUT请求方式,必须得带上
id
-
http://192.168.52.130:9200/customer/external/1
-
body里边的json
-
{
-
"name":
"pansd"
-
}
-
返回信息:
-
{
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"1",
-
"_version": 1, +1
-
"result":
"created", --updated,
-
"_shards": { --分片,集群环境下介绍;
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"created":
true
-
}
-
分析返回数据:带_的都是元数据。put请求,第一次是请求时创建操作,以后的请求是更新操作。
-
如果是post请求方式,不指定
id;
-
{
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"AX-NOVshrEj9CP_oy5eV", --自动生成
id,多次发送请求,每次都会产生唯一的
id;
-
"_version": 1,
-
"result":
"created",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"created":
true
-
}
-
--如果带上了
id,就和put一样了。
-
es的查询:
-
Get方式: customer/external/1
-
http://192.168.52.130:9200/customer/external/1
-
{
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"1",
-
"_version": 1,
-
"_seq_no":1, -- 此版本的镜像没有,每次更新就会+1,用来做乐观锁。
-
"_primary_term":1, -- 此版本的镜像没有,主分片重新分配,如果重启,就会发生变化。
-
"found":
true,
-
"_source": {
-
"name":
"pansd"
-
}
-
}
-
并发更新es的的时候,要带上乐观锁,进行并发控制。或者带上版本号,进行并发控制。
-
比如:
-
put请求:http://192.168.52.130:9200/customer/external/1?if_seq_no=1&if_primary_term=1
-
-
使用update更新:
-
post请求方式:http://192.168.52.130:9200/customer/external/1_update
-
{
-
"doc":{
-
"name":
"pshdhx"
-
}
-
}
-
1、第一次更新,版本号会改变;若在发送相同的请求,版本号和锁序列号不会改变了。【简言之:update会对比原来的数据】
-
删除文档索引:
-
delete customer/external/1
-
{
-
"found":
true,
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"1",
-
"_version": 2,
-
"result":
"deleted",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
}
-
}
-
再搜索:
-
{
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"1",
-
"found":
false
-
}
-
delete customer 【删除索引,但是es中是不能单独删除类型的;或者是把索引下所有的数据都给清空了,这样也相当于是删除了类型。】
-
-
批量导入API
-
此时批量操作json格式不对,使用text的换行又不对了,所以得使用kibana了。
-
1、首先必须post请求
-
2、customer/external/_bulk
-
action
-
body
-
具体展示为:每两行一个操作;批量插入两行数据;在kibana中的dev tools中添加如下代码;
-
post /customer/external/_bulk
-
{
"index":{
"_id":
"1"}}
-
{
"name":
"pansd"}
-
{
"index":{
"_id":
"2"}}
-
{
"name":
"pshdhx"}
-
返回值为:
-
{
-
"took": 62,
-
"errors":
false,
-
"items": [
-
{
-
"index": {
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"1",
-
"_version": 1,
-
"result":
"created",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"created":
true,
-
"status": 201
-
}
-
},
-
{
-
"index": {
-
"_index":
"customer",
-
"_type":
"external",
-
"_id":
"2",
-
"_version": 1,
-
"result":
"created",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"created":
true,
-
"status": 201
-
}
-
}
-
]
-
}
复杂DSL语句
-
复杂操作:
-
post /_bulk
-
{
"delete":{
"_index":
"website",
"_type":
"blog",
"_id":
"123"}}
-
{
"create":{
"_index":
"website",
"_type":
"blog",
"_id":
"123"}}
-
{
"title":
"my first blog"}
-
{
"index":{
"_index":
"website",
"_type":
"blog"}}
-
{
"title":
"my second blog"}
-
{
"update":{
"_index":
"website",
"_type":
"blog",
"_id":
"123"}}
-
{
"doc":{
"title":
"my updated blog post"}}
-
返回了:
-
{
-
"took": 24,
-
"errors":
false,
-
"items": [
-
{
-
"delete": {
-
"found":
true,
-
"_index":
"website",
-
"_type":
"blog",
-
"_id":
"123",
-
"_version": 6,
-
"result":
"deleted",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"status": 200
-
}
-
},
-
{
-
"create": {
-
"_index":
"website",
-
"_type":
"blog",
-
"_id":
"123",
-
"_version": 7,
-
"result":
"created",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"created":
true,
-
"status": 201
-
}
-
},
-
{
-
"index": {
-
"_index":
"website",
-
"_type":
"blog",
-
"_id":
"AX-OHvjg4kOgKKwsJNF9",
-
"_version": 1,
-
"result":
"created",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"created":
true,
-
"status": 201
-
}
-
},
-
{
-
"update": {
-
"_index":
"website",
-
"_type":
"blog",
-
"_id":
"123",
-
"_version": 8,
-
"result":
"updated",
-
"_shards": {
-
"total": 2,
-
"successful": 1,
-
"failed": 0
-
},
-
"status": 200
-
}
-
}
-
]
-
}
-
-
批量新增的数据地址:https://github.com/elastic/elasticsearch/blob/5.0/docs/src/test/resources/accounts.json
-
post /bank/account/_bulk
-
.....
-
-
新增成功后:使用高级检索:
-
docker update ... --restart=always
-
-
https://www.elastic.co/guide/index.html 参照官方文档:
-
https://www.elastic.co/guide/en/elastic-stack-get-started/7.5/index.html
-
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/getting-started-search.html
-
-
GET bank/_search?q=*&
sort=account_number:asc
-
该响应还提供有关搜索请求的以下信息:
-
-
took– Elasticsearch 运行查询需要多长时间,以毫秒为单位
-
timed_out– 搜索请求是否超时
-
_shards– 搜索了多少分片,以及多少分片成功、失败或被跳过的细分。
-
max_score– 找到的最相关文档的分数
-
hits.total.value- 找到了多少匹配的文档
-
hits.sort- 文档的排序位置(不按相关性分数排序时)
-
hits._score- 文档的相关性分数(使用时不适用match_all)
-
-
total
": 1000,但是实际上hits里边只有10条,这是分页的结果。
-
##match_all的查询;全文检索
-
GET /bank/_search
-
{
-
"query
": { "match_all
": {} },
-
"
sort
": [
-
{ "account_number
":{
-
"order
": "desc
"
-
}
-
-
}
-
-
],
-
"from
": 0,
-
"size
": 20,
-
"_source
": ["balance
","firstname
"]
-
}
-
-
##match查询;按条件模糊查询
-
GET /bank/_search
-
{
-
"query
": {
-
"match
": {
-
"address
": "mill lane
" ##凡是有mill或者是lane的,都返回
-
}
-
}
-
}
-
-
##只返回这样的mill lane;短语查询;
-
GET /bank/_search
-
{
-
"query
": { "match_phrase
": { "address
": "mill lane
" } }
-
}
-
-
must和must_not;满足should的得分高;
-
-
GET /bank/_search
-
{
-
"query
": {
-
"bool
": {
-
"must
": [
-
{ "match
": { "age
": "40
" } }
-
],
-
"must_not
": [
-
{ "match
": { "state
": "ID
" } }
-
],
-
"should
": [
-
{
-
"match
": {
-
"address
": "MILL
"
-
}
-
}
-
]
-
}
-
}
-
}
-
##filter不会计算相关性得分。must会计算相关性得分。
-
GET /bank/_search
-
{
-
"query
": {
-
"bool
": {
-
"must
": { "match_all
": {} },
-
"filter
": {
-
"range
": {
-
"balance
": {
-
"gte
": 20000,
-
"lte
": 30000
-
}
-
}
-
}
-
}
-
}
-
}
-
##多字段匹配
-
get bank/_search
-
{
-
"query
":{
-
"multi_match
": {
-
"query
": "mill
",
-
"fields
": ["state
","address
"]
-
}
-
}
-
}
-
-
使用term检索精确字段的值;与match正好相反(match适用于全文检索)
-
-
get bank/_search
-
{
-
"query
":{
-
"term
": {
-
"age
": {
-
"age
": "20
"
-
}
-
}
-
}
-
}
-
##查询精确值 这是精确匹配
-
get bank/_search
-
{
-
"query
":{
-
"match
": {
-
"address.keyword
": "MILL
"
-
}
-
}
-
}
聚合函数
-
众所周知:elasticsearch是一个搜索与分析的引擎,下面是它的分析的用法。
-
Analyze results
with aggregations
-
搜索address中包含mill的所有人的年龄分布以及平均年龄;
-
get bank/_search
-
{
-
"query":{
-
"match":{
-
"address":
"mill"
-
}
-
},
-
"aggs":{
-
"ageAgg":{
-
"terms": {
-
"field":
"age",
-
"size":
10
-
}
-
},
-
"ageAvg":{
-
"avg": {
-
"field":
"age"
-
}
-
},
-
"balanceAvg":{
-
"avg": {
-
"field":
"balance"
-
}
-
}
-
},
-
##不显示hits的结果,只显示个数。
-
"size":
0
-
}
-
-
"aggregations": {
-
"ageAgg": {
-
"doc_count_error_upper_bound":
0,
-
"sum_other_doc_count":
0,
-
"buckets": [
-
{
-
"key":
38,
-
"doc_count":
2
-
},
-
{
-
"key":
28,
-
"doc_count":
1
-
},
-
{
-
"key":
32,
-
"doc_count":
1
-
}
-
]
-
}
-
}
-
-
##按照年龄进行聚合,并且请求这些年龄的这些人的平均薪资
-
get bank/_search
-
{
-
"query":{
-
"match_all": {}
-
},
-
"aggs":{
-
"ageAgg":{
-
"terms": {
-
"field":
"age",
-
"size":
10
-
},
-
"aggs":{
-
"balanceAvg":{
-
"avg": {
-
"field":
"balance"
-
}
-
}
-
}
-
}
-
}
-
}
-
返回值:
-
"aggregations": {
-
"ageAgg": {
-
"doc_count_error_upper_bound":
0,
-
"sum_other_doc_count":
463,
-
"buckets": [
-
{
-
"key":
31,
-
"doc_count":
61,
-
"balanceAvg": {
-
"value":
28312.918032786885
-
}
-
},
-
{
-
"key":
39,
-
"doc_count":
60,
-
"balanceAvg": {
-
"value":
25269.583333333332
-
}
-
},
-
##查出所有年龄分布,并且这些年龄段中男性的平均薪资和女性的平均薪资,还有总体的平均薪资
-
get bank/_search
-
{
-
"query":{
-
"match_all": {}
-
},
-
"aggs":{
-
"ageAgg":{
-
"terms": {
-
"field":
"age",
-
"size":
10
-
},
-
"aggs": {
-
"genderAgg": {
-
"terms": {
-
"field":
"gender.keyword",
-
"size":
10
-
},
-
"aggs": {
-
"balanceAgg": {
-
"avg": {
-
"field":
"balance"
-
}
-
}
-
}
-
},
-
"balanceAvgAll":{
-
"avg": {
-
"field":
"balance"
-
}
-
}
-
}
-
}
-
}
-
}
-
返回值:
-
"buckets": [
31岁的人 有
61个;
35个男的,
26个女的; 男的的平均薪资是
29565.628571428573 女性的平均薪资是
26626.576923076922
-
{
-
"key":
31,
-
"doc_count":
61,
-
"genderAgg": {
-
"doc_count_error_upper_bound":
0,
-
"sum_other_doc_count":
0,
-
"buckets": [
-
{
-
"key":
"M",
-
"doc_count":
35,
-
"balanceAgg": {
-
"value":
29565.628571428573
-
}
-
},
-
{
-
"key":
"F",
-
"doc_count":
26,
-
"balanceAgg": {
-
"value":
26626.576923076922
-
}
-
}
-
]
-
},
-
"balanceAvgAll": {
-
"value":
28312.918032786885
-
}
-
},
mapping映射
elasticsearch的mapping方式:7.x之后,去除了type;预计8.x取消type的支持。为了防止不同的type的字段名称相同。
-
get bank/_mapping
-
-
关于ES中的数据类型:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping-types.html
-
string
-
text and keyword
-
Numeric
-
long,
integer, short, byte, double,
float, half_float, scaled_float
-
Date
-
date
-
Date nanoseconds
-
date_nanos
-
Boolean
-
boolean
-
Binary
-
binary
-
Range
-
integer_range, float_range, long_range, double_range, date_range
-
-
还有复杂数据类型。
-
-
关于映射:很恶心,不同的版本不一样,看对应版本的文档都对应不起来。
-
get bank/_mapping
-
-
-
PUT my_index
-
{
-
"mappings": {
-
"user": {
-
"_all": {
-
"enabled":
false
-
},
-
"properties": {
-
"title": {
-
"type":
"text"
-
},
-
"name": {
-
"type":
"text"
-
},
-
"age": {
-
"type":
"integer"
-
}
-
}
-
}
-
}
-
}
-
PUT twitter
-
{
-
"mappings": {
-
"user": {
-
"properties": {
-
"name": {
"type":
"text" },
-
"user_name": {
"type":
"keyword" },
-
"email": {
"type":
"keyword" }
-
}
-
},
-
"tweet": {
-
"properties": {
-
"content": {
"type":
"text" },
-
"user_name": {
"type":
"keyword" },
-
"tweeted_at": {
"type":
"date" }
-
}
-
}
-
}
-
}
-
PUT twitter/user/kimchy
-
{
-
"name":
"Shay Banon",
-
"user_name":
"kimchy",
-
"email":
"shay@kimchy.com"
-
}
-
PUT twitter/tweet/1
-
{
-
"user_name":
"kimchy",
-
"tweeted_at":
"2017-10-24T09:00:00Z",
-
"content":
"Types are going away"
-
}
-
GET twitter/tweet/_search
-
{
-
"query": {
-
"match": {
-
"user_name":
"kimchy"
-
}
-
}
-
}
-
PUT twitter
-
{
-
"mappings": {
-
"doc": {
-
"properties": {
-
"type": {
"type":
"keyword" },
-
"name": {
"type":
"text" },
-
"user_name": {
"type":
"keyword" },
-
"email": {
"type":
"keyword" },
-
"content": {
"type":
"text" },
-
"tweeted_at": {
"type":
"date" }
-
}
-
}
-
}
-
-
}
ES的映射修改和数据迁移
-
##创建一个索引,并指定映射规则
-
PUT /
my_index
-
{
-
"mappings": {
-
"properties": {
-
"age":{
"type":
"integer"},
-
"email":{
"type":
"keyword"},
-
"name":{
"type":
"text"}
-
}
-
}
-
}
-
##修改索引下的映射规则,【新增字段】
-
PUT /my_index/_
mapping
-
{
-
"properties":{
-
"employee-id":{
"type":
"keyword",
"index":
false}
-
}
-
}
-
GET /my_index/_mapping
-
##对于已经存在的映射字段,我们是不能更新的,只能添加;因为如果修改映射,那么可能存在已有的数据;我们
-
可以创建一个新的索引,重新reindex原来的数据。【数据迁移】
-
##创建新的索引,来改变老的索引的映射类型
-
PUT /
newBank
-
{
-
"mappings": {
-
"properties": {
-
-
}
-
}
-
}
-
##数据迁移
-
POST _
reindex
-
{
-
"source": {
-
"index":
"bank",
-
"type":
"account"
-
},
-
"dest": {
-
"index":
"newBank"
-
}
-
}
4、ES的分词器
关于ES的分词:
一个tokenizer(分词器)接收一个字符流,将之分割为独立的tokens(词元,通常是独立的单词),然后输出token流。
例如:whitespace tokenizer遇到空格字符时分割文本。它会将文本"Quick brown fox"分割为 【quick,brown,fox】
该tokenizer(分词器)还负责记录各个term(词条)的顺序或者是position位置(用于phrase短语和word proximity词近邻查询),以及term词条所
代表的原始word的start和end的字符偏移量。(用于高亮显示搜索的内容)
ES提供了很多内置的分词器,可以用来构建custom analyzers(自定义分词器)
4.1、安装ik分词器:
ES的分词器:7.4.2
链接:https://pan.baidu.com/s/1DpWUZEFicNiYOJDCSDidVA
提取码:pshd
--来自百度网盘超级会员V2的分享
-
-
POST _analyze
-
{
-
"analyzer":
"whitespace",
-
"text":
"The quick brown fox."
-
}
-
安装下中文分词器:ik;github到plugins;
-
进入容器内部:
-
docker
exec -it elasticsearch /bin/bash
-
pwd
-
ls
-
exit退出容器
-
先下载wget :用yum下载下载器;
-
yum install wget
-
设置root登录:
-
vi /etc/ssh/sshd_config
-
修改:passwordAuthentication
yes/no
-
重启服务:service sshd restart
-
1、必须安装ES对应版本的IK分词器;
-
2、放入的plugins目录下;
-
3、重启容器;
-
4、测试分词:
-
POST _analyze
-
{
-
"analyzer":
"ik_smart",
## ik_max_word
-
"text":
"尚硅谷电商项目"
-
}
-
返回值:
-
{
-
"tokens": [
-
{
-
"token":
"尚",
-
"start_offset": 0,
-
"end_offset": 1,
-
"type":
"CN_CHAR",
-
"position": 0
-
},
-
{
-
"token":
"硅谷",
-
"start_offset": 1,
-
"end_offset": 3,
-
"type":
"CN_WORD",
-
"position": 1
-
},
-
{
-
"token":
"电",
-
"start_offset": 3,
-
"end_offset": 4,
-
"type":
"CN_CHAR",
-
"position": 2
-
},
-
{
-
"token":
"商",
-
"start_offset": 4,
-
"end_offset": 5,
-
"type":
"CN_CHAR",
-
"position": 3
-
},
-
{
-
"token":
"项目",
-
"start_offset": 5,
-
"end_offset": 7,
-
"type":
"CN_WORD",
-
"position": 4
-
}
-
]
-
}
-
有些词语分词器不识别,需要创建自定义分词:
-
1、先设置网卡:
-
添加GATEWAY=xxxxx.1
-
DNS1=114.114.114.114
-
DNS2=8.8.8.8
-
使用了match之后,会有一个max_score最大得分;查询数字是精确查询;查询字符串是模糊查询【全文检索】;
-
关于自定义分词器:
4.2、安装nginx
-
安装nginx,把词库的地址放到nginx里边;
-
docker container
cp nginx:/etc/nginx .
-
mv nginx conf
#文件夹改名
-
mv conf nginx/
#把conf挪移到nginx文件夹里边
-
docker stop nginx;
-
docker
rm nginx;
-
docker run -p 80:80 --name nginx \
-
-v /mydata/nginx/conf/html:/usr/share/nginx/html \
-
-v /mydata/nginx/conf/logs:/var/log/nginx \
-
-v /mydata/nginx/conf:/etc/nginx \
-
-d nginx:1.10
-
docker run -p80:80 --name nginx -v/mydata/nginx/html:/usr/share/nginx/html -v/mydata/nginx/logs:/var/log/nginx -v /mydata/nginx/conf:/etc/nginx -d nginx:1.10
##【正确】
-
在nginx里边自定义fenci.txt
-
-
配置ik分词器的词库地址http://192.168.52.130/es/fenci.txt
5、 java代码交互elasticsearch
5.1、pom.xml
-
<elasticserach.version>7.4.2
</elasticserach.version>
-
<dependency>
-
<groupId>org.elasticsearch.client
</groupId>
-
<artifactId>elasticsearch-rest-high-level-client
</artifactId>
-
<version>7.4.2
</version>
-
</dependency>
5.2、配置文件configuration
-
package com.pshdhx.elasticsearch.config;
-
-
import org.apache.http.HttpHost;
-
import org.elasticsearch.client.RequestOptions;
-
import org.elasticsearch.client.RestClient;
-
import org.elasticsearch.client.RestClientBuilder;
-
import org.elasticsearch.client.RestHighLevelClient;
-
import org.springframework.context.annotation.Bean;
-
import org.springframework.context.annotation.Configuration;
-
-
/**
-
* @author pshdhx
-
* @date 2022-03-28 15:09
-
*/
-
@Configuration
-
public
class
ElasticSearchConfiguration {
-
-
public
static
final RequestOptions COMMON_OPTIONS;
-
static {
-
RequestOptions.
Builder
builder
= RequestOptions.DEFAULT.toBuilder();
-
//addHeader();
-
//setHttpAsyncResponseCounsumerFactory
-
COMMON_OPTIONS = builder.build();
-
}
-
-
@Bean
-
public RestHighLevelClient
EsRestClient
(){
-
RestClientBuilder
restClientBuilder
= RestClient.builder(
new
HttpHost(
"82.xx.xx.xxxx",
9200,
"http"));
-
RestHighLevelClient
client
=
new
RestHighLevelClient(restClientBuilder);
-
return client;
-
}
-
}
5.3、测试文件
-
package com.pshdhx.elasticsearch;
-
-
import com.alibaba.fastjson.JSON;
-
import com.pshdhx.elasticsearch.config.ElasticSearchConfig;
-
import lombok.AllArgsConstructor;
-
import lombok.Data;
-
import lombok.ToString;
-
import org.elasticsearch.action.index.IndexRequest;
-
import org.elasticsearch.action.search.SearchRequest;
-
import org.elasticsearch.action.search.SearchResponse;
-
import org.elasticsearch.client.RestHighLevelClient;
-
import org.elasticsearch.common.xcontent.XContentType;
-
import org.elasticsearch.index.query.QueryBuilder;
-
import org.elasticsearch.index.query.QueryBuilders;
-
import org.elasticsearch.search.SearchHit;
-
import org.elasticsearch.search.SearchHits;
-
import org.elasticsearch.search.aggregations.Aggregation;
-
import org.elasticsearch.search.aggregations.AggregationBuilder;
-
import org.elasticsearch.search.aggregations.AggregationBuilders;
-
import org.elasticsearch.search.aggregations.Aggregations;
-
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
-
import org.elasticsearch.search.aggregations.metrics.Avg;
-
import org.elasticsearch.search.builder.SearchSourceBuilder;
-
import org.junit.jupiter.api.Test;
-
import org.springframework.beans.factory.annotation.Autowired;
-
import org.springframework.boot.test.context.SpringBootTest;
-
-
import javax.naming.directory.SearchResult;
-
import java.io.IOException;
-
import java.util.Map;
-
-
-
@SpringBootTest
-
class
GuilimallEsApplicationTests {
-
-
@Autowired
-
private RestHighLevelClient restHighLevelClient;
-
-
@Test
-
void
testSearchEs
()
throws IOException {
-
//1、创建检索请求
-
SearchRequest
searchRequest
=
new
SearchRequest();
-
//指定索引
-
searchRequest.indices(
"bank");
-
//指定DSL检索条件
-
SearchSourceBuilder
sourceBuilder
=
new
SearchSourceBuilder();
-
sourceBuilder.query(QueryBuilders.matchQuery(
"address",
"mill"));
-
-
//测试聚合函数
-
sourceBuilder.aggregation(AggregationBuilders.terms(
"ageAgg").field(
"age").size(
10));
-
sourceBuilder.aggregation(AggregationBuilders.avg(
"balanceAvg").field(
"balance"));
-
-
System.out.println(sourceBuilder.toString());
-
searchRequest.source(sourceBuilder);
-
-
//2、执行检索
-
SearchResponse
response
= restHighLevelClient.search(searchRequest, ElasticSearchConfig.COMMON_OPTIONS);
-
-
System.out.println(response.toString());
-
Map
map
= JSON.parseObject(response.toString(), Map.class);
-
-
SearchHits
hits
= response.getHits();
-
SearchHit[] hits1 = hits.getHits();
-
for (SearchHit hit : hits1) {
-
String
index
= hit.getIndex();
-
//Map<String, Object> sourceAsMap = hit.getSourceAsMap();
-
String
sourceAsString
= hit.getSourceAsString();
-
Account
account
= JSON.parseObject(sourceAsString, Account.class);
-
System.out.println(
"Account:"+account.toString());
-
}
-
-
//获取分析信息
-
Aggregations
aggregations
= response.getAggregations();
-
Terms
ageAgg1
= aggregations.get(
"ageAgg");
-
for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
-
String
keyAsString
= bucket.getKeyAsString();
-
System.out.println(
"年龄"+keyAsString+
"===》"+bucket.getDocCount());
-
}
-
Avg
balanceAvg1
= aggregations.get(
"balanceAvg");
-
System.out.println(
"平均薪资:"+balanceAvg1.getValue());
-
}
-
-
@Test
-
void
contextLoads
() {
-
System.out.println(
"111");
-
System.out.println(restHighLevelClient);
-
}
-
-
@Test
-
public
void
indexData
()
throws IOException {
-
IndexRequest
indexRequest
=
new
IndexRequest(
"users");
-
indexRequest.id(
"1");
-
// indexRequest.source("userName","zhangsan","age",18,"gender","男");
-
User
user
=
new
User(
"pshdhx",
18,
"man");
-
String
jsonString
= JSON.toJSONString(user);
-
IndexRequest
index
= indexRequest.source(jsonString, XContentType.JSON);
-
//执行操作
-
restHighLevelClient.index(indexRequest, ElasticSearchConfig.COMMON_OPTIONS);
-
System.out.println(index);
-
}
-
-
-
@Data
-
@AllArgsConstructor
-
class
User{
-
private String userName;
-
private Integer age;
-
private String gender;
-
}
-
-
@Data
-
@ToString
-
static
class
Account {
-
private
int account_number;
-
private
int balance;
-
private String firstname;
-
private String lastname;
-
private
int age;
-
private String gender;
-
private String address;
-
private String employer;
-
private String email;
-
private String city;
-
private String state;
-
}
-
-
}
操作结果
-
{
"query":{
"match":{
"address":{
"query":
"mill",
"operator":
"OR",
"prefix_length":
0,
"max_expansions":
50,
"fuzzy_transpositions":true,
"lenient":false,
"zero_terms_query":
"NONE",
"auto_generate_synonyms_phrase_query":true,
"boost":
1.0}}},
"aggregations":{
"ageAgg":{
"terms":{
"field":
"age",
"size":
10,
"min_doc_count":
1,
"shard_min_doc_count":
0,
"show_term_doc_count_error":false,
"order":[{
"_count":
"desc"},{
"_key":
"asc"}]}},
"balanceAvg":{
"avg":{
"field":
"balance"}}}}
-
{
"took":
1,
"timed_out":false,
"_shards":{
"total":
1,
"successful":
1,
"skipped":
0,
"failed":
0},
"hits":{
"total":{
"value":
4,
"relation":
"eq"},
"max_score":
5.4032025,
"hits":[{
"_index":
"bank",
"_type":
"_doc",
"_id":
"970",
"_score":
5.4032025,
"_source":{
"account_number":
970,
"balance":
19648,
"firstname":
"Forbes",
"lastname":
"Wallace",
"age":
28,
"gender":
"M",
"address":
"990 Mill Road",
"employer":
"Pheast",
"email":
"forbeswallace@pheast.com",
"city":
"Lopezo",
"state":
"AK"}},{
"_index":
"bank",
"_type":
"_doc",
"_id":
"136",
"_score":
5.4032025,
"_source":{
"account_number":
136,
"balance":
45801,
"firstname":
"Winnie",
"lastname":
"Holland",
"age":
38,
"gender":
"M",
"address":
"198 Mill Lane",
"employer":
"Neteria",
"email":
"winnieholland@neteria.com",
"city":
"Urie",
"state":
"IL"}},{
"_index":
"bank",
"_type":
"_doc",
"_id":
"345",
"_score":
5.4032025,
"_source":{
"account_number":
345,
"balance":
9812,
"firstname":
"Parker",
"lastname":
"Hines",
"age":
38,
"gender":
"M",
"address":
"715 Mill Avenue",
"employer":
"Baluba",
"email":
"parkerhines@baluba.com",
"city":
"Blackgum",
"state":
"KY"}},{
"_index":
"bank",
"_type":
"_doc",
"_id":
"472",
"_score":
5.4032025,
"_source":{
"account_number":
472,
"balance":
25571,
"firstname":
"Lee",
"lastname":
"Long",
"age":
32,
"gender":
"F",
"address":
"288 Mill Street",
"employer":
"Comverges",
"email":
"leelong@comverges.com",
"city":
"Movico",
"state":
"MT"}}]},
"aggregations":{
"lterms#ageAgg":{
"doc_count_error_upper_bound":
0,
"sum_other_doc_count":
0,
"buckets":[{
"key":
38,
"doc_count":
2},{
"key":
28,
"doc_count":
1},{
"key":
32,
"doc_count":
1}]},
"avg#balanceAvg":{
"value":
25208.0}}}
-
Account:GuilimallEsApplicationTests.Account(account_number=
970, balance=
19648, firstname=Forbes, lastname=Wallace, age=
28, gender=M, address=
990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo,
state=AK)
-
Account:GuilimallEsApplicationTests.Account(account_number=
136, balance=
45801, firstname=Winnie, lastname=Holland, age=
38, gender=M, address=
198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie,
state=IL)
-
Account:GuilimallEsApplicationTests.Account(account_number=
345, balance=
9812, firstname=Parker, lastname=Hines, age=
38, gender=M, address=
715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum,
state=KY)
-
Account:GuilimallEsApplicationTests.Account(account_number=
472, balance=
25571, firstname=Lee, lastname=Long, age=
32, gender=F, address=
288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico,
state=MT)
-
年龄
38===》
2
-
年龄
28===》
1
-
年龄
32===》
1
-
平均薪资:
25208.0
-
转载:https://blog.csdn.net/pshdhx/article/details/125712448