Collectors are the aggregation engine of the Stream API. In backend code, they are used for:
- grouping records
- computing totals and counts
- converting lists to maps
- building API response structures
groupingBy and Downstream Collectors
Group orders by category:
Map<String, List<Order>> byCategory = orders.stream()
.collect(Collectors.groupingBy(Order::getCategory));
Revenue by category:
Map<String, BigDecimal> revenueByCategory = orders.stream()
.filter(o -> o.getStatus() == OrderStatus.COMPLETED)
.collect(Collectors.groupingBy(
Order::getCategory,
Collectors.reducing(BigDecimal.ZERO, Order::getAmount, BigDecimal::add)
));
For double-based amounts:
Map<String, Double> revenueByCategory = orders.stream()
.filter(o -> o.getStatus() == OrderStatus.COMPLETED)
.collect(Collectors.groupingBy(
Order::getCategory,
Collectors.summingDouble(Order::getAmountDouble)
));
partitioningBy
partitioningBy creates exactly two buckets.
Map<Boolean, List<Order>> fraudBuckets = orders.stream()
.collect(Collectors.partitioningBy(Order::isFraudulent));
Great for valid/invalid, active/inactive, paid/unpaid style use cases.
toMap: Handle Duplicate Keys Explicitly
A common production bug is forgetting duplicate key handling.
Bad (throws IllegalStateException on duplicate key):
Map<String, User> byEmail = users.stream()
.collect(Collectors.toMap(User::getEmail, Function.identity()));
Good:
Map<String, User> byEmail = users.stream()
.collect(Collectors.toMap(
User::getEmail,
Function.identity(),
(existing, incoming) -> existing
));
Always define merge strategy when keys can collide.
Multi-Level Grouping
Revenue by city -> category:
Map<String, Map<String, Double>> revenue = orders.stream()
.filter(o -> o.getStatus() == OrderStatus.COMPLETED)
.collect(Collectors.groupingBy(
Order::getCity,
Collectors.groupingBy(
Order::getCategory,
Collectors.summingDouble(Order::getAmountDouble)
)
));
This is where stream collectors significantly outperform manual loop readability.
Real API Example: Dashboard Summary DTO
public class SalesSummary {
private final Map<String, Double> revenueByCategory;
private final long completedCount;
private final long fraudCount;
public SalesSummary(Map<String, Double> revenueByCategory, long completedCount, long fraudCount) {
this.revenueByCategory = revenueByCategory;
this.completedCount = completedCount;
this.fraudCount = fraudCount;
}
}
Map<String, Double> revenueByCategory = orders.stream()
.filter(o -> o.getStatus() == OrderStatus.COMPLETED)
.collect(Collectors.groupingBy(Order::getCategory, Collectors.summingDouble(Order::getAmountDouble)));
long completedCount = orders.stream().filter(o -> o.getStatus() == OrderStatus.COMPLETED).count();
long fraudCount = orders.stream().filter(Order::isFraudulent).count();
SalesSummary dto = new SalesSummary(revenueByCategory, completedCount, fraudCount);
Custom Collector Example (Top N)
public static Collector<Order, ?, List<Order>> topNByAmount(int n) {
return Collector.of(
() -> new PriorityQueue<Order>(Comparator.comparingDouble(Order::getAmountDouble)),
(pq, order) -> {
pq.offer(order);
if (pq.size() > n) pq.poll();
},
(left, right) -> {
right.forEach(o -> {
left.offer(o);
if (left.size() > n) left.poll();
});
return left;
},
pq -> {
List<Order> result = new ArrayList<>(pq);
result.sort(Comparator.comparingDouble(Order::getAmountDouble).reversed());
return result;
}
);
}
Use custom collectors only when built-ins cannot express your result shape clearly.
collectingAndThen for Final DTO Shaping
collectingAndThen is useful when you want post-processing after collection.
Map<String, List<Order>> immutableByCategory = orders.stream()
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(Order::getCategory),
Collections::unmodifiableMap
));
This helps enforce immutability on aggregation results passed to other layers.
Null and Key Hygiene
Collectors assume your key/value logic is safe. Before grouping/toMap in production:
- normalize keys (
trim,toLowerCase) where needed - filter out null keys/values explicitly
- define merge behavior for duplicates
Example:
Map<String, User> byEmail = users.stream()
.filter(u -> u.getEmail() != null)
.collect(Collectors.toMap(
u -> u.getEmail().trim().toLowerCase(),
Function.identity(),
(a, b) -> a
));
Testing Collector Logic
For non-trivial collector pipelines, test:
- empty input
- duplicate keys
- null/invalid records
- deterministic totals/counts on known fixture data
Collector bugs are often aggregation-edge bugs, not syntax bugs.
Performance and Readability Rules
- use built-in collectors first
- avoid very deep nested collector trees in one expression
- extract complex downstream collectors to helper methods
- for money, prefer
BigDecimal - benchmark before parallel collection
Comments