🏡 博客首页:派 大 星
⛳️ 欢迎关注 🐳 点赞 🎒 收藏 ✏️ 留言
🎢 本文由派大星原创编撰
🚧 系列专栏:《ES小结》
🎈 本系列记录ElasticSearch技术学习历程以及问题解决
ElasticSearch高效数据统计
- 聚合查询
- ① 什么是聚合查询
- ② Kibana 命令测试聚合查询
- 创建测试索引
- 存放测试数据
- ③ 聚合操作使用
- 根据某个字段分组
- 求最大值
- 最小值
- 求总数
- 求平均值
- ④ RestHighLevelClient 测试聚合查询
- 根据某个字段分组
- 求最大值
- 求最小值
- ⑤ 子聚合
聚合查询
① 什么是聚合查询
聚合是ES除搜索功能外提供的针对ES数据做统计分析的功能,聚合有助于根据搜索查询提供聚合数据,聚合查询是数据库中重要额功能特性,ES作为搜索引擎兼数据库,同样提供了强大的聚合分析功能力,它是基于查询条件来对数据进行分桶、计算的方法,这种很类似与SQL 中的group by 再加上一些函数方法的操作。
在了解聚合查询之前需要注意的一点是:text类型是不支持聚合的,主要是因为text类型本身是分词的,通俗的说,如果一句话分成了多个词然后进行group by操作,那么问题就出现了,到底对哪一个词进行group by操作呢?无法指定!
② Kibana 命令测试聚合查询
创建测试索引
PUT /fruit { "mappings":{ "properties":{ "title":"keyword" }, "price":{ "type":"double" }, "description":{ "type":"text" } } }
存放测试数据
PUT /fruit/_bulk {"index":{}} {"title":"面包","price":19.6,"description":"小面包很便宜"} {"index":{}} {"title":"旺旺牛奶","price":29.6,"description":"旺旺牛奶很好喝"} {"index":{}} {"title":"日本豆","price":9.0,"description":"日本豆很便宜"} {"index":{}} {"title":"大辣条","price":10.6,"description":"大辣条超级好吃"} {"index":{}} {"title":"海苔","price":49.6,"description":"海苔很一般"} {"index":{}} {"title":"小饼干","price":9.6,"description":"小饼干很小"} {"index":{}} {"title":"小葡萄","price":59.6,"description":"小葡萄很好吃"} {"index":{}} {"title":"小饼干","price":19.6,"description":"小饼干很小"} {"index":{}} {"title":"小饼干","price":59.6,"description":"小饼干很小"} {"index":{}} {"title":"小饼干","price":29.6,"description":"小饼干很小"} {"index":{}} {"title":"小饼干","price":39.6,"description":"小饼干很小"}
③ 聚合操作使用
根据某个字段分组
GET /fruit/_search { "query": { "match_all": { } }, "aggs": { "price_group": { "terms": { "field": "price" } } } }
求最大值
GET /fruit/_search { "query": { "match_all": {} }, "aggs": { "max_price": { "max": { "field": "price" } } } }
最小值
GET /fruit/_search { "query": { "match_all": {} }, "size": 0, "aggs": { "min_price": { "min": { "field": "price" } } } }
求总数
GET /fruit/_search { "query": { "match_all": {} }, "size": 0, "aggs": { "min_price": { "sum": { "field": "price" } } } }
求平均值
GET /fruit/_search { "query": { "match_all": {} }, "size": 0, "aggs": { "avg_price": { "avg": { "field": "price" } } } }
④ RestHighLevelClient 测试聚合查询
在使用Java API实现上述操作之前,有必要先了解一下实现过程中使用到的某些方法以及工具
常见的聚合查询:
- 统计某个字段的数量
ValueCountBuilder vcb= AggregationBuilders.count(“分组的名称”).field(“字段”);
- 去重统计某个字段的数量(有少量的误差)
CardinalityBuilder cb= AggregationBuilders.cardinality(“分组的名称”).field(“字段”);
- 聚合过滤
FilterAggregationBuilder fab= AggregationBuilders.filter(“分组的名称”).filter(QueryBuilders.queryStringQuery(“字段:过滤值”));
- 按某个字段分组
TermsBuilder tb= AggregationBuilders.terms(“分组的名称”).field(“字段”);
- 求最大值
SumBuilder sumBuilder= AggregationBuilders.max(“分组的名称”).field(“字段”);
- 求最小值
AvgBuilder ab= AggregationBuilders.min(“分组的名称”).field(“字段”);
- 求平均值
MaxBuilder mb= AggregationBuilders.avg(“分组的名称”).field(“字段”);
- 按日期间隔分组
DateHistogramBuilder dhb= AggregationBuilders.dateHistogram(“分组的名称”).field(“字段”);
- 获取聚合里面的结果
TopHitsBuilder thb= AggregationBuilders.topHits(“分组的名称”);
- 嵌套的聚合
NestedBuilder nb= AggregationBuilders.nested(“分组的名称”).path(“字段”);
- 反转嵌套
AggregationBuilders.reverseNested(“分组的名称”).path("字段 ");
使用Java API实现上述在Kibana中的各项操作
根据某个字段分组
public class RestHighLevelClientForAggs { public static void main(String[] args) { RestHighLevelClient esClient = Client.getClient(); //基于terms 类型聚合 基于字段进行分组聚合 SearchRequest request = new SearchRequest("fruit"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder .query(QueryBuilders.matchAllQuery())//查询条件 //用来设置聚合处理 .aggregation(AggregationBuilders.terms("price_group").field("price")) .size(0); request.source(sourceBuilder); SearchResponse response = null; try { response = esClient.search(request, RequestOptions.DEFAULT); //处理聚合的结果 Aggregations aggregations = response.getAggregations(); ParsedDoubleTerms doubleTerms = aggregations.get("price_group"); List extends Terms.Bucket> buckets = doubleTerms.getBuckets(); for (Terms.Bucket bucket : buckets) { System.out.println(bucket.getKey()+" "+bucket.getDocCount()); } }catch (Exception e){ e.printStackTrace(); } } }
求最大值
public class AggregationForMax { public static void main(String[] args) { RestHighLevelClient client = Client.getClient(); SearchRequest request = new SearchRequest("fruit"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder .query(QueryBuilders.matchAllQuery()) .aggregation(AggregationBuilders.max("max_price").field("price")) .size(0); request.source(sourceBuilder); try { SearchResponse searchResponse = client.search(request,RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); ParsedMax maxPrice = aggregations.get("max_price"); System.out.println(maxPrice.getValueAsString()); } catch (IOException e) { e.printStackTrace(); } } }
注意: 在最终获取分组中的数据时,首先判断所求得的结果是否是Key-Value的结果,比如上述根据某个字段分组的示例从Kibana中就可以看出是Key-Value的形式,所以aggregations.get("分组名称");返回的结果应该为ParsedXXXXTerms类型,如果像求最大值、平均值、最小值等在执行到该aggregations.get("分组名称");返回的结果应该为ParsedXXX类型
求最小值
public class AggregationForMin { public static void main(String[] args) { RestHighLevelClient client = Client.getClient(); SearchRequest searchRequest = new SearchRequest("fruit"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder .query(QueryBuilders.matchAllQuery()) .aggregation(AggregationBuilders.min("min_price").field("price")) .size(0); searchRequest.source(sourceBuilder); try { SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); ParsedMin minPrice = aggregations.get("min_price"); System.out.println(minPrice.getValueAsString()); } catch (IOException e) { e.printStackTrace(); } } }
等等一系列需求的演示和模拟,使用ES来完成数据的统计。
⑤ 子聚合
先从需求展开,先按照title进行分组,然后再对每一个分组中的成员对价格price进行降序排序
先使用命令在Kibana中实现该操作,其次再根据实现的命令转换为Java代码实现
使用命令操作进行实现
GET /fruit/_search { "query": { "match_all": {} }, "size": 0, "aggs": { "title_group": { "terms": { "field": "title" }, "aggs": { "sort_price": { "terms": { "field": "price", "order": { "_key": "desc" } } } } } } }
将实现的命令转换为Java流程
public class AggregationForSub { public static void main(String[] args) { RestHighLevelClient client = Client.getClient(); SearchRequest searchRequest = new SearchRequest("fruit"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("title_group").field("title"); TermsAggregationBuilder subAggregationBuilder = AggregationBuilders.terms("price_sort").field("price").order(BucketOrder.count(false)); //subAggregation 为子聚合 termsAggregationBuilder.subAggregation(subAggregationBuilder); sourceBuilder .query(QueryBuilders.matchAllQuery()) .aggregation(termsAggregationBuilder) .size(0); searchRequest.source(sourceBuilder); try { SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); ParsedStringTerms titleGroup = aggregations.get("title_group"); for (Terms.Bucket bucket : titleGroup.getBuckets()) { System.out.println(bucket.getKey()+"--"+bucket.getDocCount()); Aggregations bucketAggregations = bucket.getAggregations(); ParsedDoubleTerms priceSort = bucketAggregations.get("price_sort"); for (Terms.Bucket priceSortBucket : priceSort.getBuckets()) { System.out.println(priceSortBucket.getKey()+"--"+priceSortBucket.getDocCount()); } } } catch (IOException e) { e.printStackTrace(); } } }
- 反转嵌套
- 嵌套的聚合
- 获取聚合里面的结果
- 按日期间隔分组
- 求平均值
- 求最小值
- 求最大值
- 按某个字段分组
- 聚合过滤
- 去重统计某个字段的数量(有少量的误差)
- 统计某个字段的数量
猜你喜欢
- 3小时前【深度学习目标检测】十六、基于深度学习的麦穗头系统-含GUI和源码(python,yolov8)
- 3小时前[Exceptions]运行hive sql报错NoViableAltException
- 3小时前Vue常见错误---Error in mounted hook
- 3小时前前端实现(excel)xlsx文件预览
- 3小时前Java接收前端请求体方式
- 3小时前若依框架详细使用
- 3小时前iptables使用
- 3小时前基于 Eureka 的 Ribbon 负载均衡实现原理【SpringCloud 源码分析】
- 3小时前汽车座椅空调(汽车座椅空调出风口可以封掉吗)
- 2小时前手机掉厕所怎么办(手机掉蹲厕里了应该怎么处理)
网友评论
- 搜索
- 最新文章
- 热门文章