Build a Rate Limiter in Go — Part 1: The Sliding Window
Every API needs a rate limiter. Without one, a single misbehaving client — or a well-meaning one hammering your endpoint — can take down your service for everyone else.
Most tutorials give you the concept. This series gives you a real implementation, explains the tradeoffs honestly, and builds toward something you can actually use in production.
Series overview:
- Part 1 — Sliding window rate limiter (you are here)
- Part 2 — Scaling with map sharding
- Part 3 — Token bucket algorithm
- Part 4 — Redis-backed distributed rate limiting
- Part 5 — Benchmarks: sliding window vs token bucket vs Redis
What is a Rate Limiter?
A rate limiter controls how many requests a client can make within a given time period. For example: allow a user to make at most 100 requests per minute. If they exceed that, reject the request — usually with a 429 Too Many Requests response.
There are several algorithms for doing this. In Part 1, we’re using the sliding window approach.
The Sliding Window Algorithm
Imagine a window of time — say, 60 seconds — that slides forward as time passes. At any given moment, we look back 60 seconds and count how many requests came in. If the count is under our limit, we allow the request. If not, we reject it.
This is more accurate than a fixed window (which can allow double the limit at window boundaries) and simpler to reason about than token bucket (which we’ll cover in Part 3).
The Implementation
The Struct
type RateLimiter struct {
mu sync.Mutex
requests map[string][]time.Time
limit int
window time.Duration
done chan struct{}
}
requestsmaps a user ID to a slice of timestamps — every request they’ve made within the current windowlimitis the maximum number of requests allowed per windowwindowis the duration of the sliding windowdoneis a channel for graceful shutdown of the background cleanup goroutinemuprotects concurrent access to the map
Creating a New Rate Limiter
func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
rl := &RateLimiter{
requests: make(map[string][]time.Time),
limit: limit,
window: window,
done: make(chan struct{}),
}
go rl.cleanup()
return rl
}
Simple enough — initialize the struct and kick off the cleanup goroutine immediately.
Checking If a Request Is Allowed
func (rl *RateLimiter) IsAllowed(userID string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
valid := rl.validRequests(rl.requests[userID])
if len(valid) >= rl.limit {
rl.requests[userID] = valid
return false
}
rl.requests[userID] = append(valid, time.Now())
return true
}
On every request we:
- Grab the lock
- Filter out timestamps that have expired (outside the window) — that’s the sliding part
- If we’re at or over the limit, reject and return false
- Otherwise, append the current timestamp and allow the request
The validRequests Helper
func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
now := time.Now()
threshold := now.Add(-rl.window)
var valid []time.Time
for _, t := range requests {
if t.After(threshold) {
valid = append(valid, t)
}
}
return valid
}
This takes a slice of timestamps and returns only those that fall within the current window. Notice it takes the slice as a parameter rather than accessing rl.requests directly — this keeps it safe to call from both IsAllowed and cleanup without worrying about locking semantics inside the helper itself.
The Cleanup Goroutine
Without cleanup, the requests map grows forever. Even users who stopped making requests long ago will sit in memory indefinitely. The cleanup goroutine fixes that.
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(rl.window)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case <-ticker.C:
rl.mu.Lock()
for userID := range rl.requests {
valid := rl.validRequests(rl.requests[userID])
if len(valid) == 0 {
delete(rl.requests, userID)
continue
}
rl.requests[userID] = valid
}
rl.mu.Unlock()
}
}
}
A few things worth noting:
- The ticker fires every
rl.windowduration — there’s no point cleaning up more frequently than the window itself - If a user’s valid request slice is empty after pruning, we delete their key entirely — otherwise the map fills up with empty slices
defer ticker.Stop()prevents a goroutine leak when the rate limiter is shut down- The
donechannel gives us a clean exit path
Graceful Shutdown
func (rl *RateLimiter) Stop() {
close(rl.done)
}
Always call Stop() when you’re done with the rate limiter — in tests especially. Without it, the cleanup goroutine runs forever.
Putting It Together
rl := NewRateLimiter(10, time.Minute) // 10 requests per minute
defer rl.Stop()
if rl.IsAllowed("user-123") {
// handle request
} else {
// return 429 Too Many Requests
}
What This Gets Right
- Correct — the sliding window is accurate, no double-limit boundary issues
- Concurrent — mutex protects all shared state
- Memory-bounded — cleanup goroutine keeps the map from growing forever
- Clean lifecycle — goroutine exits properly via the done channel
For a single-server deployment this is genuinely production-worthy.
Honest Limitations
In-memory only — this rate limiter lives in your process’s memory. If your server restarts, all state is lost. If you run multiple instances behind a load balancer, each instance has its own independent counter — a user could make 10 requests to instance A and 10 to instance B, bypassing the limit entirely. We’ll fix this in Part 4 with Redis.
Lock contention at scale — the cleanup goroutine holds a single global lock while it sweeps the entire map. With a small number of users this is fine. But with millions of users, that sweep could take long enough to measurably block real incoming requests. In Part 2 we’ll fix this with map sharding.
Full Implementation
package ratelimiter
import (
"sync"
"time"
)
type RateLimiter struct {
mu sync.Mutex
requests map[string][]time.Time
limit int
window time.Duration
done chan struct{}
cleanupInterval time.Duration
}
func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
rl := &RateLimiter{
requests: make(map[string][]time.Time),
limit: limit,
window: window,
done: make(chan struct{}),
cleanupInterval: time.Second * 10,
}
go func() {
rl.cleanup()
}()
return rl
}
func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
now := time.Now()
threshold := now.Add(-rl.window)
var valid []time.Time
for _, t := range requests {
if t.After(threshold) {
valid = append(valid, t)
}
}
return valid
}
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(rl.cleanupInterval)
defer ticker.Stop()
for {
select {
case <-rl.done:
return
case _ = <-ticker.C:
rl.mu.Lock()
for userId := range rl.requests {
validReq := rl.validRequests(rl.requests[userId])
if len(validReq) == 0 {
delete(rl.requests, userId)
continue
}
rl.requests[userId] = validReq
}
rl.mu.Unlock()
}
}
}
func (rl *RateLimiter) IsAllowed(userID string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
var valid []time.Time = rl.validRequests(rl.requests[userID])
if len(valid) >= rl.limit {
rl.requests[userID] = valid
return false
}
rl.requests[userID] = append(valid, time.Now())
return true
}
func (rl *RateLimiter) Stop() {
close(rl.done)
}
What’s Next
In Part 2 we tackle the lock contention problem head on — splitting the map into shards, each with its own mutex, so cleanup and request handling can run concurrently across shards.
Follow along to catch the rest of the series.