CyclicBarrier is the coordination primitive for repeated rendezvous.
It answers a different question than CountDownLatch.
Not:
- when are these startup tasks done once
But:
- when has every worker reached the same synchronization point for this round
That makes it useful for iterative parallel workflows where progress should happen in lockstep.
Problem Statement
Imagine a parallel simulation, batch transformation job, or multi-stage computation where several worker threads repeat the same pattern:
- do one round of work
- wait for every other worker
- continue to the next round together
If one thread runs ahead while another is still finishing the previous step, the algorithm may become incorrect.
This is not a mutual exclusion problem. It is a rendezvous problem.
Naive Version
A common wrong instinct is to hand-roll the barrier with a shared counter and sleeps.
class BrokenRoundCoordinator {
private volatile int finishedThisRound;
void waitForOthers(int partyCount) throws InterruptedException {
finishedThisRound++;
while (finishedThisRound < partyCount) {
Thread.sleep(10);
}
}
}
This is broken because:
- the increment is not atomic
- there is no reset per round
- it relies on polling
- it is fragile under failure and interruption
Once coordination becomes repeated and structured, the ad hoc approach collapses quickly.
Mental Model
CyclicBarrier manages a group of parties that repeatedly meet at a barrier point.
For each cycle:
- each party calls
await() - each caller blocks until the last party arrives
- once all parties arrive, the barrier trips
- all waiting parties are released
- the barrier resets automatically for the next round
That automatic reset is what makes it cyclic.
The parties are usually symmetric:
- same team size every round
- same rendezvous point every round
If those assumptions stop being true, the fit becomes weaker.
Core API
Important methods:
new CyclicBarrier(parties): create a reusable barriernew CyclicBarrier(parties, barrierAction): add an action run when the barrier tripsawait(): wait for the current cycle to completeawait(timeout, unit): bounded waitingreset(): manually reset a broken or waiting barrierisBroken(): detect barrier failure stategetNumberWaiting(): inspect how many parties are currently waiting
Two behavioral details matter a lot:
- one failing or timing-out party can break the barrier for everyone
- the barrier action runs once per cycle, before parties are released
That means a barrier is a coordination contract shared by the whole group.
Runnable Example
The following example models three workers processing two rounds of data cleanup.
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class CyclicBarrierDemo {
public static void main(String[] args) throws Exception {
CyclicBarrier barrier = new CyclicBarrier(
3,
() -> System.out.println("All workers reached the barrier, advancing phase"));
ExecutorService executor = Executors.newFixedThreadPool(3);
for (int workerId = 1; workerId <= 3; workerId++) {
final int id = workerId;
executor.submit(() -> runWorker(id, barrier));
}
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
}
static void runWorker(int workerId, CyclicBarrier barrier) {
try {
for (int round = 1; round <= 2; round++) {
TimeUnit.MILLISECONDS.sleep(100L * workerId);
System.out.println("Worker " + workerId + " finished round " + round);
barrier.await();
}
} catch (Exception e) {
throw new RuntimeException("Worker " + workerId + " failed", e);
}
}
}
The important part is not the sleep. It is the repeated structure:
- work
- rendezvous
- work
- rendezvous
That is the natural home of CyclicBarrier.
Production-Style Scenario
Good fits include:
- simulation steps
- iterative graph or matrix processing
- staged bulk transformations
- multi-threaded test harnesses where rounds must align
Example:
- several workers process partitioned data
- after each round, everyone must stop so the next global rule can begin
In that workflow, allowing one worker to race ahead is not faster. It is incorrect.
Failure Model
This is where many developers underestimate CyclicBarrier.
If one thread:
- is interrupted
- times out
- throws before arriving
the barrier may become broken. That affects every other waiting thread.
This is intentional. The barrier represents a group agreement. If one party cannot participate in the round, the group-level coordination is compromised.
That is why barrier-based systems need a clear plan for:
- timeouts
- cancellation
- recovery
Common Mistakes
Choosing the wrong party count
If the barrier expects four parties and only three threads ever call await(), the round never completes.
Forgetting bounded waits in fragile environments
An indefinite await() may be fine in controlled compute loops, but service code usually needs timeout-aware failure paths.
Doing too much in the barrier action
The barrier action runs while release of the group is still pending.
If it blocks on slow I/O or long computation, it turns the barrier into a hidden bottleneck.
Using it when parties are not stable
If the number of participants changes over time, Phaser is often a better fit.
Testing and Debugging Notes
When a barrier-based flow hangs, investigate these questions first:
- which worker never reached
await() - did one worker throw before arriving
- did a timeout break the barrier
- is the barrier configured with the wrong party count
Useful observability:
- log round numbers
- log before and after
await() - inspect
isBroken() - inspect
getNumberWaiting()in test scenarios
Barrier bugs are easier to reason about when logs show the round number explicitly.
Performance and Trade-Offs
CyclicBarrier is not about maximizing raw throughput.
It is about preserving a repeated synchronization structure.
That means its trade-offs are often algorithmic:
- correctness through lockstep progression
- reduced concurrency across phase boundaries
- sensitivity to stragglers
If one worker is much slower than the others, the entire group pays the cost every round.
That is not necessarily a bug. It may simply reflect that the algorithm truly requires phase alignment.
Decision Guide
Use CyclicBarrier when:
- a fixed group of threads repeats the same rounds
- every participant must reach the rendezvous before the next round begins
- automatic reset after each round is desired
Do not use it when:
- the coordination is one-shot only
- parties are highly dynamic
- the workflow is better modeled as futures or queue-driven pipelines
Choose:
CountDownLatchfor one-time gatesPhaserfor dynamic multi-phase participationCompletableFuturefor result-based async composition
Key Takeaways
CyclicBarrieris for repeated rendezvous among a fixed number of parties.- It resets automatically after each successful round.
- One failing or timing-out party can break the barrier for everyone.
- It is strongest for phase-aligned parallel workflows, not for one-shot startup gates.
Next post: Barrier Action Patterns with CyclicBarrier
Comments