Build a Rate Limiter in Go — Part 2: Scaling with Map Sharding
In Part 1 we built a sliding window rate limiter that is correct, concurrent, and memory-safe. For a single server with a modest number of users, it works well.
But there is a problem hiding in the cleanup goroutine. Let’s expose it.
Series overview:
- Part 1 — Sliding window rate limiter
- Part 2 — Scaling with map sharding (you are here)
- Part 3 — Token bucket algorithm
- Part 4 — Redis-backed distributed rate limiting
- Part 5 — Benchmarks: sliding window vs token bucket vs Redis
The Problem: Global Lock Contention
Here is the cleanup loop from Part 1:
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(rl.window)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case <-ticker.C:
rl.mu.Lock()
for userID := range rl.requests {
valid := rl.validRequests(rl.requests[userID])
if len(valid) == 0 {
delete(rl.requests, userID)
continue
}
rl.requests[userID] = valid
}
rl.mu.Unlock()
}
}
}
See the issue? rl.mu.Lock() acquires a single global lock for the entire duration of the sweep. While cleanup is running, every call to IsAllowed blocks — real user requests are queued, waiting for cleanup to finish.
With a small map this is negligible. But with a million users, iterating over that map while holding the lock could take tens of milliseconds. At scale, that is a real latency spike on every cleanup tick.
The Fix: Map Sharding
The idea is straightforward — instead of one big map with one lock, we use N smaller maps each with their own independent lock.
When a request comes in for a given userID, we hash the ID to determine which shard it belongs to, then only lock that shard. Cleanup iterates shards independently, locking one at a time. No single lock is held for long, and IsAllowed calls on different shards never block each other.
Before sharding: After sharding:
[ map: all users ] [ shard 0: users 0..N/4 ]
[ single mutex ] [ shard 1: users N/4..N/2 ]
[ shard 2: users N/2..3N/4]
[ shard 3: users 3N/4..N ]
The Implementation
The Shard
Each shard is its own self-contained unit — a map and a mutex:
type shard struct {
mu sync.Mutex
requests map[string][]time.Time
}
The Sharded Rate Limiter
const numShards = 256
type RateLimiter struct {
shards [numShards]shard
limit int
window time.Duration
done chan struct{}
}
func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
rl := &RateLimiter{
limit: limit,
window: window,
done: make(chan struct{}),
}
for i := range rl.shards {
rl.shards[i].requests = make(map[string][]time.Time)
}
go rl.cleanup()
return rl
}
256 shards is a reasonable default — it means each shard holds roughly 1/256th of your users. You can tune this up or down depending on your concurrency profile.
Picking a Shard
We need a consistent way to map a userID to a shard index. FNV is a fast, non-cryptographic hash that works well here:
func shardIndex(userID string) int {
h := fnv.New32a()
h.Write([]byte(userID))
return int(h.Sum32()) % numShards
}
Same userID always maps to the same shard — deterministic and cheap.
IsAllowed
func (rl *RateLimiter) IsAllowed(userID string) bool {
idx := shardIndex(userID)
s := &rl.shards[idx]
s.mu.Lock()
defer s.mu.Unlock()
valid := rl.validRequests(s.requests[userID])
if len(valid) >= rl.limit {
s.requests[userID] = valid
return false
}
s.requests[userID] = append(valid, time.Now())
return true
}
We lock only the shard that owns this userID. Every other shard — and every request hitting a different shard — is completely unaffected.
The validRequests Helper
Unchanged from Part 1 — still takes a slice, still has no lock dependency:
func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
now := time.Now()
threshold := now.Add(-rl.window)
var valid []time.Time
for _, t := range requests {
if t.After(threshold) {
valid = append(valid, t)
}
}
return valid
}
Cleanup
Now cleanup iterates shards one at a time, locking each briefly and releasing before moving to the next:
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(rl.window)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case <-ticker.C:
for i := range rl.shards {
s := &rl.shards[i]
s.mu.Lock()
for userID := range s.requests {
valid := rl.validRequests(s.requests[userID])
if len(valid) == 0 {
delete(s.requests, userID)
continue
}
s.requests[userID] = valid
}
s.mu.Unlock()
}
}
}
}
The key difference — each s.mu.Lock() is held only for the duration of one shard’s sweep, not the entire map. With 256 shards, the maximum lock hold time per shard is roughly 1/256th of what it was before. IsAllowed calls on other shards are never blocked at all.
Graceful Shutdown
func (rl *RateLimiter) Stop() {
close(rl.done)
}
Same as Part 1 — nothing changes here.
Putting It Together
The public API is identical to Part 1 — drop-in replacement:
rl := NewRateLimiter(10, time.Minute)
defer rl.Stop()
if rl.IsAllowed("user-123") {
// handle request
} else {
// return 429 Too Many Requests
}
What Changed and Why It Matters
| Part 1 | Part 2 | |
|---|---|---|
| Lock scope | Global — entire map | Per shard — 1/256th of map |
Cleanup blocks IsAllowed | Yes, always | Only for same shard |
Concurrent IsAllowed calls | Fully serialized | Parallel across shards |
| Memory overhead | Minimal | 256x map + mutex overhead (negligible) |
In practice, under high concurrency, the sharded version will show significantly lower p99 latency on IsAllowed — especially during cleanup ticks.
Choosing the Number of Shards
256 is a sensible default but not sacred. A few rules of thumb:
- More shards — lower contention, slightly higher memory overhead, better for high concurrency
- Fewer shards — lower memory overhead, higher contention under load
- Power of 2 — makes the modulo operation cheap (though with FNV the difference is minimal)
For most services, 256 shards is more than enough. If you’re running benchmarks and seeing contention, bump it to 512 or 1024.
Honest Limitations
Sharding solves the lock contention problem for a single instance. But one fundamental limitation from Part 1 remains — this is still in-memory and single-process.
Run two instances of your service behind a load balancer and you’re back to independent counters per instance. User A could hit instance 1 ten times and instance 2 ten times, effectively doubling their allowed request rate.
For a single server — or a service where you pin users to instances — this is production-ready. For true horizontal scaling, you need a shared external store.
That’s what Part 4 is for.
Full Implementation
package ratelimiter
import (
"hash/fnv"
"sync"
"time"
)
const numShards = 256
type shard struct {
mu sync.Mutex
requests map[string][]time.Time
}
type RateLimiter struct {
shards [numShards]shard
limit int
window time.Duration
done chan struct{}
}
func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
rl := &RateLimiter{
limit: limit,
window: window,
done: make(chan struct{}),
}
for i := range rl.shards {
rl.shards[i].requests = make(map[string][]time.Time)
}
go rl.cleanup()
return rl
}
func shardIndex(userID string) int {
h := fnv.New32a()
h.Write([]byte(userID))
return int(h.Sum32()) % numShards
}
func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
now := time.Now()
threshold := now.Add(-rl.window)
var valid []time.Time
for _, t := range requests {
if t.After(threshold) {
valid = append(valid, t)
}
}
return valid
}
func (rl *RateLimiter) IsAllowed(userID string) bool {
idx := shardIndex(userID)
s := &rl.shards[idx]
s.mu.Lock()
defer s.mu.Unlock()
valid := rl.validRequests(s.requests[userID])
if len(valid) >= rl.limit {
s.requests[userID] = valid
return false
}
s.requests[userID] = append(valid, time.Now())
return true
}
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(rl.window)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case <-ticker.C:
for i := range rl.shards {
s := &rl.shards[i]
s.mu.Lock()
for userID := range s.requests {
valid := rl.validRequests(s.requests[userID])
if len(valid) == 0 {
delete(s.requests, userID)
continue
}
s.requests[userID] = valid
}
s.mu.Unlock()
}
}
}
}
func (rl *RateLimiter) Stop() {
close(rl.done)
}
What’s Next
In Part 3 we step back from the implementation details and look at a different algorithm entirely — the token bucket. We’ll compare it against the sliding window with real benchmarks so you can make an informed choice for your use case.
Follow along to catch the rest of the series.