Collectors are the aggregation engine of the Stream API. In backend code, they are used for:

  • grouping records
  • computing totals and counts
  • converting lists to maps
  • building API response structures

groupingBy and Downstream Collectors

Group orders by category:

Map<String, List<Order>> byCategory = orders.stream()
        .collect(Collectors.groupingBy(Order::getCategory));

Revenue by category:

Map<String, BigDecimal> revenueByCategory = orders.stream()
        .filter(o -> o.getStatus() == OrderStatus.COMPLETED)
        .collect(Collectors.groupingBy(
                Order::getCategory,
                Collectors.reducing(BigDecimal.ZERO, Order::getAmount, BigDecimal::add)
        ));

For double-based amounts:

Map<String, Double> revenueByCategory = orders.stream()
        .filter(o -> o.getStatus() == OrderStatus.COMPLETED)
        .collect(Collectors.groupingBy(
                Order::getCategory,
                Collectors.summingDouble(Order::getAmountDouble)
        ));

partitioningBy

partitioningBy creates exactly two buckets.

Map<Boolean, List<Order>> fraudBuckets = orders.stream()
        .collect(Collectors.partitioningBy(Order::isFraudulent));

Great for valid/invalid, active/inactive, paid/unpaid style use cases.


toMap: Handle Duplicate Keys Explicitly

A common production bug is forgetting duplicate key handling.

Bad (throws IllegalStateException on duplicate key):

Map<String, User> byEmail = users.stream()
        .collect(Collectors.toMap(User::getEmail, Function.identity()));

Good:

Map<String, User> byEmail = users.stream()
        .collect(Collectors.toMap(
                User::getEmail,
                Function.identity(),
                (existing, incoming) -> existing
        ));

Always define merge strategy when keys can collide.


Multi-Level Grouping

Revenue by city -> category:

Map<String, Map<String, Double>> revenue = orders.stream()
        .filter(o -> o.getStatus() == OrderStatus.COMPLETED)
        .collect(Collectors.groupingBy(
                Order::getCity,
                Collectors.groupingBy(
                        Order::getCategory,
                        Collectors.summingDouble(Order::getAmountDouble)
                )
        ));

This is where stream collectors significantly outperform manual loop readability.


Real API Example: Dashboard Summary DTO

public class SalesSummary {
    private final Map<String, Double> revenueByCategory;
    private final long completedCount;
    private final long fraudCount;

    public SalesSummary(Map<String, Double> revenueByCategory, long completedCount, long fraudCount) {
        this.revenueByCategory = revenueByCategory;
        this.completedCount = completedCount;
        this.fraudCount = fraudCount;
    }
}

Map<String, Double> revenueByCategory = orders.stream()
        .filter(o -> o.getStatus() == OrderStatus.COMPLETED)
        .collect(Collectors.groupingBy(Order::getCategory, Collectors.summingDouble(Order::getAmountDouble)));

long completedCount = orders.stream().filter(o -> o.getStatus() == OrderStatus.COMPLETED).count();
long fraudCount = orders.stream().filter(Order::isFraudulent).count();

SalesSummary dto = new SalesSummary(revenueByCategory, completedCount, fraudCount);

Custom Collector Example (Top N)

public static Collector<Order, ?, List<Order>> topNByAmount(int n) {
    return Collector.of(
            () -> new PriorityQueue<Order>(Comparator.comparingDouble(Order::getAmountDouble)),
            (pq, order) -> {
                pq.offer(order);
                if (pq.size() > n) pq.poll();
            },
            (left, right) -> {
                right.forEach(o -> {
                    left.offer(o);
                    if (left.size() > n) left.poll();
                });
                return left;
            },
            pq -> {
                List<Order> result = new ArrayList<>(pq);
                result.sort(Comparator.comparingDouble(Order::getAmountDouble).reversed());
                return result;
            }
    );
}

Use custom collectors only when built-ins cannot express your result shape clearly.


collectingAndThen for Final DTO Shaping

collectingAndThen is useful when you want post-processing after collection.

Map<String, List<Order>> immutableByCategory = orders.stream()
        .collect(Collectors.collectingAndThen(
                Collectors.groupingBy(Order::getCategory),
                Collections::unmodifiableMap
        ));

This helps enforce immutability on aggregation results passed to other layers.


Null and Key Hygiene

Collectors assume your key/value logic is safe. Before grouping/toMap in production:

  • normalize keys (trim, toLowerCase) where needed
  • filter out null keys/values explicitly
  • define merge behavior for duplicates

Example:

Map<String, User> byEmail = users.stream()
        .filter(u -> u.getEmail() != null)
        .collect(Collectors.toMap(
                u -> u.getEmail().trim().toLowerCase(),
                Function.identity(),
                (a, b) -> a
        ));

Testing Collector Logic

For non-trivial collector pipelines, test:

  1. empty input
  2. duplicate keys
  3. null/invalid records
  4. deterministic totals/counts on known fixture data

Collector bugs are often aggregation-edge bugs, not syntax bugs.


Performance and Readability Rules

  • use built-in collectors first
  • avoid very deep nested collector trees in one expression
  • extract complex downstream collectors to helper methods
  • for money, prefer BigDecimal
  • benchmark before parallel collection

Related Posts

Comments