CyclicBarrier in Java Deep Dive

CyclicBarrier is the coordination primitive for repeated rendezvous.

It answers a different question than CountDownLatch.

Not:

when are these startup tasks done once

But:

when has every worker reached the same synchronization point for this round

That makes it useful for iterative parallel workflows where progress should happen in lockstep.

Problem Statement

Imagine a parallel simulation, batch transformation job, or multi-stage computation where several worker threads repeat the same pattern:

do one round of work
wait for every other worker
continue to the next round together

If one thread runs ahead while another is still finishing the previous step, the algorithm may become incorrect.

This is not a mutual exclusion problem. It is a rendezvous problem.

Naive Version

A common wrong instinct is to hand-roll the barrier with a shared counter and sleeps.

class BrokenRoundCoordinator {
    private volatile int finishedThisRound;

    void waitForOthers(int partyCount) throws InterruptedException {
        finishedThisRound++;
        while (finishedThisRound < partyCount) {
            Thread.sleep(10);
        }
    }
}

This is broken because:

the increment is not atomic
there is no reset per round
it relies on polling
it is fragile under failure and interruption

Once coordination becomes repeated and structured, the ad hoc approach collapses quickly.

Mental Model

CyclicBarrier manages a group of parties that repeatedly meet at a barrier point.

For each cycle:

each party calls await()
each caller blocks until the last party arrives
once all parties arrive, the barrier trips
all waiting parties are released
the barrier resets automatically for the next round

That automatic reset is what makes it cyclic.

The parties are usually symmetric:

same team size every round
same rendezvous point every round

If those assumptions stop being true, the fit becomes weaker.

Core API

Important methods:

new CyclicBarrier(parties): create a reusable barrier
new CyclicBarrier(parties, barrierAction): add an action run when the barrier trips
await(): wait for the current cycle to complete
await(timeout, unit): bounded waiting
reset(): manually reset a broken or waiting barrier
isBroken(): detect barrier failure state
getNumberWaiting(): inspect how many parties are currently waiting

Two behavioral details matter a lot:

one failing or timing-out party can break the barrier for everyone
the barrier action runs once per cycle, before parties are released

That means a barrier is a coordination contract shared by the whole group.

Runnable Example

The following example models three workers processing two rounds of data cleanup.

import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class CyclicBarrierDemo {

    public static void main(String[] args) throws Exception {
        CyclicBarrier barrier = new CyclicBarrier(
                3,
                () -> System.out.println("All workers reached the barrier, advancing phase"));

        ExecutorService executor = Executors.newFixedThreadPool(3);

        for (int workerId = 1; workerId <= 3; workerId++) {
            final int id = workerId;
            executor.submit(() -> runWorker(id, barrier));
        }

        executor.shutdown();
        executor.awaitTermination(10, TimeUnit.SECONDS);
    }

    static void runWorker(int workerId, CyclicBarrier barrier) {
        try {
            for (int round = 1; round <= 2; round++) {
                TimeUnit.MILLISECONDS.sleep(100L * workerId);
                System.out.println("Worker " + workerId + " finished round " + round);
                barrier.await();
            }
        } catch (Exception e) {
            throw new RuntimeException("Worker " + workerId + " failed", e);
        }
    }
}

The important part is not the sleep. It is the repeated structure:

work
rendezvous
work
rendezvous

That is the natural home of CyclicBarrier.

Production-Style Scenario

Good fits include:

simulation steps
iterative graph or matrix processing
staged bulk transformations
multi-threaded test harnesses where rounds must align

Example:

several workers process partitioned data
after each round, everyone must stop so the next global rule can begin

In that workflow, allowing one worker to race ahead is not faster. It is incorrect.

Failure Model

This is where many developers underestimate CyclicBarrier.

If one thread:

is interrupted
times out
throws before arriving

the barrier may become broken. That affects every other waiting thread.

This is intentional. The barrier represents a group agreement. If one party cannot participate in the round, the group-level coordination is compromised.

That is why barrier-based systems need a clear plan for:

timeouts
cancellation
recovery

Common Mistakes

Choosing the wrong party count

If the barrier expects four parties and only three threads ever call await(), the round never completes.

Forgetting bounded waits in fragile environments

An indefinite await() may be fine in controlled compute loops, but service code usually needs timeout-aware failure paths.

Doing too much in the barrier action

The barrier action runs while release of the group is still pending.

If it blocks on slow I/O or long computation, it turns the barrier into a hidden bottleneck.

Using it when parties are not stable

If the number of participants changes over time, Phaser is often a better fit.

Testing and Debugging Notes

When a barrier-based flow hangs, investigate these questions first:

which worker never reached await()
did one worker throw before arriving
did a timeout break the barrier
is the barrier configured with the wrong party count

Useful observability:

log round numbers
log before and after await()
inspect isBroken()
inspect getNumberWaiting() in test scenarios

Barrier bugs are easier to reason about when logs show the round number explicitly.

Performance and Trade-Offs

CyclicBarrier is not about maximizing raw throughput.

It is about preserving a repeated synchronization structure.

That means its trade-offs are often algorithmic:

correctness through lockstep progression
reduced concurrency across phase boundaries
sensitivity to stragglers

If one worker is much slower than the others, the entire group pays the cost every round.

That is not necessarily a bug. It may simply reflect that the algorithm truly requires phase alignment.

Decision Guide

Use CyclicBarrier when:

a fixed group of threads repeats the same rounds
every participant must reach the rendezvous before the next round begins
automatic reset after each round is desired

Do not use it when:

the coordination is one-shot only
parties are highly dynamic
the workflow is better modeled as futures or queue-driven pipelines

Choose:

CountDownLatch for one-time gates
Phaser for dynamic multi-phase participation
CompletableFuture for result-based async composition

Key Takeaways

CyclicBarrier is for repeated rendezvous among a fixed number of parties.
It resets automatically after each successful round.
One failing or timing-out party can break the barrier for everyone.
It is strongest for phase-aligned parallel workflows, not for one-shot startup gates.

Next post: Barrier Action Patterns with CyclicBarrier

Share on

X Facebook LinkedIn Bluesky