Build a Rate Limiter in Go — Part 3: The Token Bucket Algorithm
In Part 1 we built a sliding window rate limiter. In Part 2 we scaled it with map sharding. Both posts used the same underlying algorithm — count requests within a time window, reject when over the limit.
In Part 3 we take a different approach entirely: the token bucket.
Series overview:
- Part 1 — Sliding window rate limiter
- Part 2 — Scaling with map sharding
- Part 3 — Token bucket algorithm (you are here)
- Part 4 — Redis-backed distributed rate limiting
- Part 5 — Benchmarks: sliding window vs token bucket vs Redis
The Mental Model
Imagine a bucket that holds tokens. The bucket has a maximum capacity — say, 10 tokens. Tokens are added to the bucket at a fixed rate — say, 1 token per second. When a request comes in, it consumes one token. If the bucket is empty, the request is rejected.
Two properties fall out of this naturally:
Burst handling — if no requests have come in for a while, the bucket fills up. A user can then make several requests in quick succession, up to the bucket’s capacity. The sliding window doesn’t allow this — it counts strictly within the window regardless of prior activity.
Sustained rate — over time, the average request rate can’t exceed the refill rate. The burst headroom gets consumed and the user is back to one request per refill interval.
This makes the token bucket a better fit for APIs where occasional bursts are legitimate — think a user opening your app after several hours of inactivity — but you still want to protect against sustained abuse.
Sliding Window vs Token Bucket
Before the implementation, it’s worth understanding when to reach for each:
| Sliding Window | Token Bucket | |
|---|---|---|
| Burst handling | Strict — no burst allowed | Flexible — bursts up to capacity |
| Memory per user | Slice of timestamps | Two integers (tokens + last refill time) |
| Implementation complexity | Moderate | Simple |
| Best for | Strict per-window limits | APIs with legitimate burst patterns |
The token bucket is also cheaper in memory — instead of storing a slice of timestamps per user, we only need two values: the current token count and the last refill timestamp.
The Implementation
The Struct
type bucket struct {
tokens float64
lastRefil time.Time
}
type RateLimiter struct {
mu sync.Mutex
buckets map[string]*bucket
capacity float64
refillRate float64 // tokens per second
done chan struct{}
}
A few design decisions worth noting:
tokensis afloat64rather than an integer — this lets us handle fractional token accrual accurately over time without rounding errorsrefillRateis expressed in tokens per second, making the math straightforward regardless of the refill intervalcapacityis the maximum number of tokens a bucket can hold — this is the burst ceiling
Creating a New Rate Limiter
func NewRateLimiter(capacity float64, refillRate float64) *RateLimiter {
rl := &RateLimiter{
buckets: make(map[string]*bucket),
capacity: capacity,
refillRate: refillRate,
done: make(chan struct{}),
}
go rl.cleanup()
return rl
}
Example: NewRateLimiter(10, 1) gives each user a bucket that holds 10 tokens and refills at 1 token per second. A user can burst up to 10 requests immediately, then is limited to 1 request per second thereafter.
Refilling a Bucket
Instead of a background ticker that refills all buckets, we refill lazily — only when a request comes in for that user. This is more efficient and eliminates the need for a separate refill goroutine.
func (rl *RateLimiter) refill(b *bucket) {
now := time.Now()
elapsed := now.Sub(b.lastRefil).Seconds()
b.tokens = min(rl.capacity, b.tokens+elapsed*rl.refillRate)
b.lastRefil = now
}
We calculate how much time has passed since the last refill, multiply by the refill rate to get the number of new tokens, and cap at capacity. Simple and cheap.
Checking If a Request Is Allowed
func (rl *RateLimiter) IsAllowed(userID string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
b, ok := rl.buckets[userID]
if !ok {
b = &bucket{
tokens: rl.capacity,
lastRefil: time.Now(),
}
rl.buckets[userID] = b
}
rl.refill(b)
if b.tokens < 1 {
return false
}
b.tokens--
return true
}
New users start with a full bucket — their first interaction gets the full burst allowance. On every request we refill based on elapsed time, then check if there’s at least one token available.
The Cleanup Goroutine
Same problem as Part 1 — buckets for inactive users accumulate in memory. We still need a cleanup pass:
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(time.Minute)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case <-ticker.C:
rl.mu.Lock()
for userID, b := range rl.buckets {
rl.refill(b)
if b.tokens >= rl.capacity {
delete(rl.buckets, userID)
}
}
rl.mu.Unlock()
}
}
}
The cleanup heuristic here is different from Part 1 — instead of checking for expired timestamps, we check if the bucket is full. A full bucket means the user hasn’t made a request in long enough for all tokens to refill — they’re effectively inactive. Safe to evict.
Graceful Shutdown
func (rl *RateLimiter) Stop() {
close(rl.done)
}
Putting It Together
// 10 token capacity, refills at 2 tokens per second
rl := NewRateLimiter(10, 2)
defer rl.Stop()
if rl.IsAllowed("user-123") {
// handle request
} else {
// return 429 Too Many Requests
}
When to Use Token Bucket vs Sliding Window
Reach for the sliding window when:
- You need strict per-window enforcement — no exceptions for burst
- Your API contract is expressed as “N requests per minute/hour”
- Fairness between users is the primary concern
Reach for the token bucket when:
- Bursts are a legitimate usage pattern for your API
- You want lower memory overhead per user
- Your rate limit is more naturally expressed as a sustained rate with burst headroom (e.g. webhooks, streaming endpoints)
Neither is universally better. The right choice depends on the semantics you want to enforce.
Honest Limitations
The same fundamental limitation from Parts 1 and 2 applies here — this is still in-memory and single-process. Multiple instances behind a load balancer means independent buckets per instance, and users can exceed the intended limit by spreading requests across instances.
The cleanup goroutine also still holds a global lock during its sweep. If you need the token bucket at high scale, the same sharding approach from Part 2 applies — each shard gets its own map of buckets and its own mutex.
Both problems get solved properly in Part 4 with Redis.
Full Implementation
package ratelimiter
import (
"sync"
"time"
)
type bucket struct {
tokens float64
lastRefill time.Time
}
type RateLimiter struct {
mu sync.Mutex
buckets map[string]*bucket
capacity float64
refillRate float64
done chan struct{}
}
func NewRateLimiter(capacity float64, refillRate float64) *RateLimiter {
rl := &RateLimiter{
buckets: make(map[string]*bucket),
capacity: capacity,
refillRate: refillRate,
done: make(chan struct{}),
}
go rl.cleanup()
return rl
}
func (rl *RateLimiter) refill(b *bucket) {
now := time.Now()
elapsed := now.Sub(b.lastRefill).Seconds()
b.tokens = min(rl.capacity, b.tokens+elapsed*rl.refillRate)
b.lastRefill = now
}
func (rl *RateLimiter) IsAllowed(userID string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
b, ok := rl.buckets[userID]
if !ok {
b = &bucket{
tokens: rl.capacity,
lastRefill: time.Now(),
}
rl.buckets[userID] = b
}
rl.refill(b)
if b.tokens < 1 {
return false
}
b.tokens--
return true
}
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(time.Minute)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case <-ticker.C:
rl.mu.Lock()
for userID, b := range rl.buckets {
rl.refill(b)
if b.tokens >= rl.capacity {
delete(rl.buckets, userID)
}
}
rl.mu.Unlock()
}
}
}
func (rl *RateLimiter) Stop() {
close(rl.done)
}
What’s Next
In Part 4 we move beyond single-instance limitations entirely. We’ll build a Redis-backed rate limiter that works correctly across multiple instances — the last piece needed for a production-grade distributed system.
Follow along to catch the rest of the series.