Build a Rate Limiter in Go — Part 2: Scaling with Map Sharding

In Part 1 we built a sliding window rate limiter that is correct, concurrent, and memory-safe. For a single server with a modest number of users, it works well.

But there is a problem hiding in the cleanup goroutine. Let’s expose it.

Series overview:

Part 1 — Sliding window rate limiter

Part 2 — Scaling with map sharding (you are here)

Part 3 — Token bucket algorithm

Part 4 — Redis-backed distributed rate limiting

Part 5 — Benchmarks: sliding window vs token bucket vs Redis

The Problem: Global Lock Contention

Here is the cleanup loop from Part 1:

func (rl *RateLimiter) cleanup() {
    ticker := time.NewTicker(rl.window)
    defer ticker.Stop()

    for {
        select {
        case <-rl.done:
            return
        case <-ticker.C:
            rl.mu.Lock()
            for userID := range rl.requests {
                valid := rl.validRequests(rl.requests[userID])
                if len(valid) == 0 {
                    delete(rl.requests, userID)
                    continue
                }
                rl.requests[userID] = valid
            }
            rl.mu.Unlock()
        }
    }
}

See the issue? rl.mu.Lock() acquires a single global lock for the entire duration of the sweep. While cleanup is running, every call to IsAllowed blocks — real user requests are queued, waiting for cleanup to finish.

With a small map this is negligible. But with a million users, iterating over that map while holding the lock could take tens of milliseconds. At scale, that is a real latency spike on every cleanup tick.

The Fix: Map Sharding

The idea is straightforward — instead of one big map with one lock, we use N smaller maps each with their own independent lock.

When a request comes in for a given userID, we hash the ID to determine which shard it belongs to, then only lock that shard. Cleanup iterates shards independently, locking one at a time. No single lock is held for long, and IsAllowed calls on different shards never block each other.

Before sharding:               After sharding:

[ map: all users ]             [ shard 0: users 0..N/4  ]
[ single mutex   ]             [ shard 1: users N/4..N/2 ]
                               [ shard 2: users N/2..3N/4]
                               [ shard 3: users 3N/4..N  ]

The Implementation

The Shard

Each shard is its own self-contained unit — a map and a mutex:

type shard struct {
    mu       sync.Mutex
    requests map[string][]time.Time
}

The Sharded Rate Limiter

const numShards = 256

type RateLimiter struct {
    shards [numShards]shard
    limit  int
    window time.Duration
    done   chan struct{}
}

func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
    rl := &RateLimiter{
        limit:  limit,
        window: window,
        done:   make(chan struct{}),
    }
    for i := range rl.shards {
        rl.shards[i].requests = make(map[string][]time.Time)
    }
    go rl.cleanup()
    return rl
}

256 shards is a reasonable default — it means each shard holds roughly 1/256th of your users. You can tune this up or down depending on your concurrency profile.

Picking a Shard

We need a consistent way to map a userID to a shard index. FNV is a fast, non-cryptographic hash that works well here:

func shardIndex(userID string) int {
    h := fnv.New32a()
    h.Write([]byte(userID))
    return int(h.Sum32()) % numShards
}

Same userID always maps to the same shard — deterministic and cheap.

IsAllowed

func (rl *RateLimiter) IsAllowed(userID string) bool {
    idx := shardIndex(userID)
    s := &rl.shards[idx]

    s.mu.Lock()
    defer s.mu.Unlock()

    valid := rl.validRequests(s.requests[userID])

    if len(valid) >= rl.limit {
        s.requests[userID] = valid
        return false
    }

    s.requests[userID] = append(valid, time.Now())
    return true
}

We lock only the shard that owns this userID. Every other shard — and every request hitting a different shard — is completely unaffected.

The validRequests Helper

Unchanged from Part 1 — still takes a slice, still has no lock dependency:

func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
    now := time.Now()
    threshold := now.Add(-rl.window)
    var valid []time.Time
    for _, t := range requests {
        if t.After(threshold) {
            valid = append(valid, t)
        }
    }
    return valid
}

Cleanup

Now cleanup iterates shards one at a time, locking each briefly and releasing before moving to the next:

func (rl *RateLimiter) cleanup() {
    ticker := time.NewTicker(rl.window)
    defer ticker.Stop()

    for {
        select {
        case <-rl.done:
            return
        case <-ticker.C:
            for i := range rl.shards {
                s := &rl.shards[i]
                s.mu.Lock()
                for userID := range s.requests {
                    valid := rl.validRequests(s.requests[userID])
                    if len(valid) == 0 {
                        delete(s.requests, userID)
                        continue
                    }
                    s.requests[userID] = valid
                }
                s.mu.Unlock()
            }
        }
    }
}

The key difference — each s.mu.Lock() is held only for the duration of one shard’s sweep, not the entire map. With 256 shards, the maximum lock hold time per shard is roughly 1/256th of what it was before. IsAllowed calls on other shards are never blocked at all.

Graceful Shutdown

func (rl *RateLimiter) Stop() {
    close(rl.done)
}

Same as Part 1 — nothing changes here.

Putting It Together

The public API is identical to Part 1 — drop-in replacement:

rl := NewRateLimiter(10, time.Minute)
defer rl.Stop()

if rl.IsAllowed("user-123") {
    // handle request
} else {
    // return 429 Too Many Requests
}

What Changed and Why It Matters

	Part 1	Part 2
Lock scope	Global — entire map	Per shard — 1/256th of map
Cleanup blocks `IsAllowed`	Yes, always	Only for same shard
Concurrent `IsAllowed` calls	Fully serialized	Parallel across shards
Memory overhead	Minimal	256x map + mutex overhead (negligible)

In practice, under high concurrency, the sharded version will show significantly lower p99 latency on IsAllowed — especially during cleanup ticks.

Choosing the Number of Shards

256 is a sensible default but not sacred. A few rules of thumb:

More shards — lower contention, slightly higher memory overhead, better for high concurrency
Fewer shards — lower memory overhead, higher contention under load
Power of 2 — makes the modulo operation cheap (though with FNV the difference is minimal)

For most services, 256 shards is more than enough. If you’re running benchmarks and seeing contention, bump it to 512 or 1024.

Honest Limitations

Sharding solves the lock contention problem for a single instance. But one fundamental limitation from Part 1 remains — this is still in-memory and single-process.

Run two instances of your service behind a load balancer and you’re back to independent counters per instance. User A could hit instance 1 ten times and instance 2 ten times, effectively doubling their allowed request rate.

For a single server — or a service where you pin users to instances — this is production-ready. For true horizontal scaling, you need a shared external store.

That’s what Part 4 is for.

Full Implementation

package ratelimiter

import (
	"hash/fnv"
	"sync"
	"time"
)

const numShards = 256

type shard struct {
	mu       sync.Mutex
	requests map[string][]time.Time
}

type RateLimiter struct {
	shards [numShards]shard
	limit  int
	window time.Duration
	done   chan struct{}
}

func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
	rl := &RateLimiter{
		limit:  limit,
		window: window,
		done:   make(chan struct{}),
	}
	for i := range rl.shards {
		rl.shards[i].requests = make(map[string][]time.Time)
	}
	go rl.cleanup()
	return rl
}

func shardIndex(userID string) int {
	h := fnv.New32a()
	h.Write([]byte(userID))
	return int(h.Sum32()) % numShards
}

func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
	now := time.Now()
	threshold := now.Add(-rl.window)
	var valid []time.Time
	for _, t := range requests {
		if t.After(threshold) {
			valid = append(valid, t)
		}
	}
	return valid
}

func (rl *RateLimiter) IsAllowed(userID string) bool {
	idx := shardIndex(userID)
	s := &rl.shards[idx]

	s.mu.Lock()
	defer s.mu.Unlock()

	valid := rl.validRequests(s.requests[userID])

	if len(valid) >= rl.limit {
		s.requests[userID] = valid
		return false
	}

	s.requests[userID] = append(valid, time.Now())
	return true
}

func (rl *RateLimiter) cleanup() {
	ticker := time.NewTicker(rl.window)
	defer ticker.Stop()

	for {
		select {
		case <-rl.done:
			return
		case <-ticker.C:
			for i := range rl.shards {
				s := &rl.shards[i]
				s.mu.Lock()
				for userID := range s.requests {
					valid := rl.validRequests(s.requests[userID])
					if len(valid) == 0 {
						delete(s.requests, userID)
						continue
					}
					s.requests[userID] = valid
				}
				s.mu.Unlock()
			}
		}
	}
}

func (rl *RateLimiter) Stop() {
	close(rl.done)
}

What’s Next

In Part 3 we step back from the implementation details and look at a different algorithm entirely — the token bucket. We’ll compare it against the sliding window with real benchmarks so you can make an informed choice for your use case.

Follow along to catch the rest of the series.