飞道的博客

Java8 stream 中利用 groupingBy 进行多字段分组求和

610人阅读  评论(0)

Java8的groupingBy实现集合的分组,类似Mysql的group by分组功能,注意得到的是一个map

 

对集合按照单个属性分组、分组计数、排序


  
  1. List<String> items =
  2. Arrays.asList( "apple", "apple", "banana",
  3. "apple", "orange", "banana", "papaya");
  4. // 分组
  5. Map<String, List<String>> result1 = items.stream().collect(
  6. Collectors.groupingBy(
  7. Function.identity()
  8. )
  9. );
  10. //{papaya=[papaya], orange=[orange], banana=[banana, banana], apple=[apple, apple, apple]}
  11. System.out.println(result1);
  12. // 分组计数
  13. Map<String, Long> result2 = items.stream().collect(
  14. Collectors.groupingBy(
  15. Function.identity(), Collectors.counting()
  16. )
  17. );
  18. // {papaya=1, orange=1, banana=2, apple=3}
  19. System.out.println(result2);
  20. Map<String, Long> finalMap = new LinkedHashMap<>();
  21. //分组, 计数和排序
  22. result2.entrySet().stream()
  23. .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
  24. .forEachOrdered(e -> finalMap.put(e.getKey(), e.getValue()));
  25. // {apple=3, banana=2, papaya=1, orange=1}
  26. System.out.println(finalMap);

 

集合按照多个属性分组

1.多个属性拼接出一个组合属性


  
  1. public static void main(String[] args) {
  2. User user1 = new User( "zhangsan", "beijing", 10);
  3. User user2 = new User( "zhangsan", "beijing", 20);
  4. User user3 = new User( "lisi", "shanghai", 30);
  5. List<User> list = new ArrayList<User>();
  6. list.add(user1);
  7. list.add(user2);
  8. list.add(user3);
  9. Map<String, List<User>> collect = list.stream().collect(Collectors.groupingBy(e -> fetchGroupKey(e)));
  10. //{zhangsan#beijing=[User{age=10, name='zhangsan', address='beijing'}, User{age=20, name='zhangsan', address='beijing'}],
  11. // lisi#shanghai=[User{age=30, name='lisi', address='shanghai'}]}
  12. System.out.println(collect);
  13. }
  14. private static String fetchGroupKey(User user){
  15. return user.getName() + "#"+ user.getAddress();
  16. }

2.嵌套调用groupBy


  
  1. User user1 = new User( "zhangsan", "beijing", 10);
  2. User user2 = new User( "zhangsan", "beijing", 20);
  3. User user3 = new User( "lisi", "shanghai", 30);
  4. List<User> list = new ArrayList<User>();
  5. list.add(user1);
  6. list.add(user2);
  7. list.add(user3);
  8. Map<String, Map<String, List<User>>> collect
  9. = list.stream().collect(
  10. Collectors.groupingBy(
  11. User::getAddress, Collectors.groupingBy(User::getName)
  12. )
  13. );
  14. System.out.println(collect);

3. 使用Arrays.asList

我有一个与Web访问记录相关的域对象列表。这些域对象可以扩展到数千个。
我没有资源或需求将它们以原始格式存储在数据库中,因此我希望预先计算聚合并将聚合的数据放在数据库中。
我需要聚合在5分钟窗口中传输的总字节数,如下面的sql查询


  
  1. select
  2. round(request_timestamp, '5') as window, --round timestamp to the nearest 5 minute
  3. cdn,
  4. isp,
  5. http_result_code,
  6. transaction_time,
  7. sum(bytes_transferred)
  8. from web_records
  9. group by
  10. round(request_timestamp, '5'),
  11. cdn,
  12. isp,
  13. http_result_code,
  14. transaction_time


在java 8中,我当前的第一次尝试是这样的,我知道这个解决方案类似于Group by multiple field names in java 8


  
  1. Map<Date, Map<String, Map<String, Map<String, Map<String, Integer>>>>>>> aggregatedData =
  2. webRecords
  3. .stream()
  4. .collect(Collectors.groupingBy(WebRecord::getFiveMinuteWindow,
  5. Collectors.groupingBy(WebRecord::getCdn,
  6. Collectors.groupingBy(WebRecord::getIsp,
  7. Collectors.groupingBy(WebRecord::getResultCode,
  8. Collectors.groupingBy(WebRecord::getTxnTime,
  9. Collectors.reducing( 0,
  10. WebRecord::getReqBytes(),
  11. Integer::sum)))))));


这是可行的,但它是丑陋的,所有这些嵌套的地图是一个噩梦!要将地图“展平”或“展开”成行,我必须这样做


  
  1. for (Date window : aggregatedData.keySet()) {
  2. for (String cdn : aggregatedData.get(window).keySet()) {
  3. for (String isp : aggregatedData.get(window).get(cdn).keySet()) {
  4. for (String resultCode : aggregatedData.get(window).get(cdn).get(isp).keySet()) {
  5. for (String txnTime : aggregatedData.get(window).get(cdn).get(isp).get(resultCode).keySet()) {
  6. Integer bytesTransferred = aggregatedData.get(window).get(cdn).get(distId).get(isp).get(resultCode).get(txnTime);
  7. AggregatedRow row = new AggregatedRow(window, cdn, distId...


如你所见,这是相当混乱和难以维持。
有谁知道更好的方法吗?任何帮助都将不胜感激。
我想知道是否有更好的方法来展开嵌套的映射,或者是否有一个库允许您对集合进行分组。

 

最佳答案

您应该为地图创建自定义密钥。最简单的方法是使用Arrays.asList


  
  1. Function<WebRecord, List<Object>> keyExtractor = wr ->
  2. Arrays.<Object>asList(wr.getFiveMinuteWindow(), wr.getCdn(), wr.getIsp(),
  3. wr.getResultCode(), wr.getTxnTime());
  4. Map<List<Object>, Integer> aggregatedData = webRecords.stream().collect(
  5. Collectors.groupingBy(keyExtractor, Collectors.summingInt(WebRecord::getReqBytes)));


在这种情况下,键是按固定顺序列出的5个元素。不是很面向对象,但很简单。或者,您可以定义自己的表示自定义键的类型,并创建适当的hashCode/equals实现。

参考链接:

 


转载:https://blog.csdn.net/fly910905/article/details/104005444
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场