CyclicBarrier is the coordination primitive for repeated rendezvous.

It answers a different question than CountDownLatch.

Not:

  • when are these startup tasks done once

But:

  • when has every worker reached the same synchronization point for this round

That makes it useful for iterative parallel workflows where progress should happen in lockstep.


Problem Statement

Imagine a parallel simulation, batch transformation job, or multi-stage computation where several worker threads repeat the same pattern:

  1. do one round of work
  2. wait for every other worker
  3. continue to the next round together

If one thread runs ahead while another is still finishing the previous step, the algorithm may become incorrect.

This is not a mutual exclusion problem. It is a rendezvous problem.


Naive Version

A common wrong instinct is to hand-roll the barrier with a shared counter and sleeps.

class BrokenRoundCoordinator {
    private volatile int finishedThisRound;

    void waitForOthers(int partyCount) throws InterruptedException {
        finishedThisRound++;
        while (finishedThisRound < partyCount) {
            Thread.sleep(10);
        }
    }
}

This is broken because:

  • the increment is not atomic
  • there is no reset per round
  • it relies on polling
  • it is fragile under failure and interruption

Once coordination becomes repeated and structured, the ad hoc approach collapses quickly.


Mental Model

CyclicBarrier manages a group of parties that repeatedly meet at a barrier point.

For each cycle:

  1. each party calls await()
  2. each caller blocks until the last party arrives
  3. once all parties arrive, the barrier trips
  4. all waiting parties are released
  5. the barrier resets automatically for the next round

That automatic reset is what makes it cyclic.

The parties are usually symmetric:

  • same team size every round
  • same rendezvous point every round

If those assumptions stop being true, the fit becomes weaker.


Core API

Important methods:

  • new CyclicBarrier(parties): create a reusable barrier
  • new CyclicBarrier(parties, barrierAction): add an action run when the barrier trips
  • await(): wait for the current cycle to complete
  • await(timeout, unit): bounded waiting
  • reset(): manually reset a broken or waiting barrier
  • isBroken(): detect barrier failure state
  • getNumberWaiting(): inspect how many parties are currently waiting

Two behavioral details matter a lot:

  1. one failing or timing-out party can break the barrier for everyone
  2. the barrier action runs once per cycle, before parties are released

That means a barrier is a coordination contract shared by the whole group.


Runnable Example

The following example models three workers processing two rounds of data cleanup.

import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class CyclicBarrierDemo {

    public static void main(String[] args) throws Exception {
        CyclicBarrier barrier = new CyclicBarrier(
                3,
                () -> System.out.println("All workers reached the barrier, advancing phase"));

        ExecutorService executor = Executors.newFixedThreadPool(3);

        for (int workerId = 1; workerId <= 3; workerId++) {
            final int id = workerId;
            executor.submit(() -> runWorker(id, barrier));
        }

        executor.shutdown();
        executor.awaitTermination(10, TimeUnit.SECONDS);
    }

    static void runWorker(int workerId, CyclicBarrier barrier) {
        try {
            for (int round = 1; round <= 2; round++) {
                TimeUnit.MILLISECONDS.sleep(100L * workerId);
                System.out.println("Worker " + workerId + " finished round " + round);
                barrier.await();
            }
        } catch (Exception e) {
            throw new RuntimeException("Worker " + workerId + " failed", e);
        }
    }
}

The important part is not the sleep. It is the repeated structure:

  • work
  • rendezvous
  • work
  • rendezvous

That is the natural home of CyclicBarrier.


Production-Style Scenario

Good fits include:

  • simulation steps
  • iterative graph or matrix processing
  • staged bulk transformations
  • multi-threaded test harnesses where rounds must align

Example:

  • several workers process partitioned data
  • after each round, everyone must stop so the next global rule can begin

In that workflow, allowing one worker to race ahead is not faster. It is incorrect.


Failure Model

This is where many developers underestimate CyclicBarrier.

If one thread:

  • is interrupted
  • times out
  • throws before arriving

the barrier may become broken. That affects every other waiting thread.

This is intentional. The barrier represents a group agreement. If one party cannot participate in the round, the group-level coordination is compromised.

That is why barrier-based systems need a clear plan for:

  • timeouts
  • cancellation
  • recovery

Common Mistakes

Choosing the wrong party count

If the barrier expects four parties and only three threads ever call await(), the round never completes.

Forgetting bounded waits in fragile environments

An indefinite await() may be fine in controlled compute loops, but service code usually needs timeout-aware failure paths.

Doing too much in the barrier action

The barrier action runs while release of the group is still pending.

If it blocks on slow I/O or long computation, it turns the barrier into a hidden bottleneck.

Using it when parties are not stable

If the number of participants changes over time, Phaser is often a better fit.


Testing and Debugging Notes

When a barrier-based flow hangs, investigate these questions first:

  1. which worker never reached await()
  2. did one worker throw before arriving
  3. did a timeout break the barrier
  4. is the barrier configured with the wrong party count

Useful observability:

  • log round numbers
  • log before and after await()
  • inspect isBroken()
  • inspect getNumberWaiting() in test scenarios

Barrier bugs are easier to reason about when logs show the round number explicitly.


Performance and Trade-Offs

CyclicBarrier is not about maximizing raw throughput.

It is about preserving a repeated synchronization structure.

That means its trade-offs are often algorithmic:

  • correctness through lockstep progression
  • reduced concurrency across phase boundaries
  • sensitivity to stragglers

If one worker is much slower than the others, the entire group pays the cost every round.

That is not necessarily a bug. It may simply reflect that the algorithm truly requires phase alignment.


Decision Guide

Use CyclicBarrier when:

  • a fixed group of threads repeats the same rounds
  • every participant must reach the rendezvous before the next round begins
  • automatic reset after each round is desired

Do not use it when:

  • the coordination is one-shot only
  • parties are highly dynamic
  • the workflow is better modeled as futures or queue-driven pipelines

Choose:

  • CountDownLatch for one-time gates
  • Phaser for dynamic multi-phase participation
  • CompletableFuture for result-based async composition

Key Takeaways

  • CyclicBarrier is for repeated rendezvous among a fixed number of parties.
  • It resets automatically after each successful round.
  • One failing or timing-out party can break the barrier for everyone.
  • It is strongest for phase-aligned parallel workflows, not for one-shot startup gates.

Next post: Barrier Action Patterns with CyclicBarrier

Comments