Semaphores are naturally good at concurrency limiting.

They are only conditionally good at rate limiting.

Those are not the same thing:

  • concurrency limiting controls how many operations run at once
  • rate limiting controls how many operations may begin over time

Developers often blur these ideas because both are forms of admission control. But the operational behavior is different.


Problem Statement

Imagine a service calling a flaky third-party dependency.

You may want to guarantee all of the following:

  • no more than 10 calls in flight at once
  • no more than 100 calls start per second
  • overload gets rejected quickly instead of creating huge internal queues

One semaphore can solve the first requirement directly. It does not fully solve the second by itself.

That distinction is the whole point of this post.


Concurrency Limiting with Semaphore

This is the natural fit.

A semaphore with 10 permits means:

  • at most 10 calls can be active at once
  • the 11th caller must wait or fail

That directly controls:

  • peak downstream fan-out
  • connection or thread pressure
  • memory growth from too many active tasks

It is often the simplest and most practical safeguard for a slow dependency.


Runnable Example

import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;

public class SemaphoreConcurrencyLimiterDemo {

    public static void main(String[] args) throws Exception {
        LimitedPartnerClient client = new LimitedPartnerClient(3);

        for (int i = 1; i <= 5; i++) {
            final int requestId = i;
            new Thread(() -> {
                try {
                    String result = client.callPartner("request-" + requestId);
                    System.out.println(result);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }, "caller-" + i).start();
        }
    }

    static final class LimitedPartnerClient {
        private final Semaphore permits;

        LimitedPartnerClient(int maxConcurrentCalls) {
            this.permits = new Semaphore(maxConcurrentCalls);
        }

        String callPartner(String requestId) throws InterruptedException {
            if (!permits.tryAcquire(200, TimeUnit.MILLISECONDS)) {
                return "rejected-" + requestId + "-over-capacity";
            }

            try {
                TimeUnit.MILLISECONDS.sleep(500);
                return "ok-" + requestId;
            } finally {
                permits.release();
            }
        }
    }
}

This code is a true concurrency limiter:

  • it bounds active work
  • it can reject quickly under pressure

That alone often produces a major stability improvement.


Why This Is Not Full Rate Limiting

Suppose each operation finishes quickly.

With a 10-permit semaphore, you might process:

  • 10 calls now
  • 10 more a few milliseconds later
  • many more within the same second

So a semaphore does not inherently enforce:

  • X requests per second

It enforces:

  • X requests at the same time

This can indirectly lower rate, but it is not the same control surface.


Can a Semaphore Approximate Rate Limiting

Yes, but only in a limited way.

You can treat permits like tokens refilled periodically. For example:

  • start each second with 100 permits
  • each request acquires one
  • a scheduler replenishes the permits for the next second

That is a coarse token-bucket-like pattern.

But once you need:

  • smooth per-time-unit behavior
  • burst handling rules
  • distributed coordination
  • weighted costs

you usually want a dedicated rate-limiting design rather than a plain semaphore alone.


Production Guidance

Use a semaphore directly when the real risk is:

  • too many active expensive operations
  • downstream collapse from too much fan-out
  • resource exhaustion from concurrent in-flight work

Use a dedicated rate-control design when the real contract is:

  • API quota per second or minute
  • burst policy
  • tenant-specific request budgets
  • globally coordinated rate enforcement

Many systems need both:

  • concurrency limit for local protection
  • rate limit for external contract protection

Those are complementary, not competing, controls.


Common Mistakes

Calling a concurrency limit a rate limit

This leads to false confidence and confused capacity planning.

Waiting forever for permits

If overload is possible, bounded tryAcquire is often safer than unbounded waiting.

Otherwise you may move the queue from the downstream dependency into your own process.

Ignoring queue growth around the semaphore

A semaphore can protect a resource while still allowing large external request queues to build elsewhere.

You still need a whole-system overload strategy.


Decision Guide

Use Semaphore for concurrency limiting when:

  • the danger is too many active in-flight operations
  • you want to cap local work and protect shared capacity

Use a rate-limiting design when:

  • the danger is too many operation starts per time window
  • you must honor an external quota or traffic budget

Use both when:

  • you need local stability and external contract control at the same time

Key Takeaways

  • Semaphores are a natural tool for concurrency limiting, not for precise time-based rate limiting.
  • Concurrency limiting protects active capacity; rate limiting controls operation starts over time.
  • A semaphore can approximate simple token-style throttling, but dedicated rate-control designs are better for real quotas and burst policies.
  • In production systems, the best answer is often both a concurrency limit and a rate limit.

Next post: Exchanger in Java and When It Is Useful

Comments