Build a Rate Limiter in Go — Part 1: The Sliding Window

Every API needs a rate limiter. Without one, a single misbehaving client — or a well-meaning one hammering your endpoint — can take down your service for everyone else.

Most tutorials give you the concept. This series gives you a real implementation, explains the tradeoffs honestly, and builds toward something you can actually use in production.

Series overview:

Part 1 — Sliding window rate limiter (you are here)

Part 2 — Scaling with map sharding

Part 3 — Token bucket algorithm

Part 4 — Redis-backed distributed rate limiting

Part 5 — Benchmarks: sliding window vs token bucket vs Redis

What is a Rate Limiter?

A rate limiter controls how many requests a client can make within a given time period. For example: allow a user to make at most 100 requests per minute. If they exceed that, reject the request — usually with a 429 Too Many Requests response.

There are several algorithms for doing this. In Part 1, we’re using the sliding window approach.

The Sliding Window Algorithm

Imagine a window of time — say, 60 seconds — that slides forward as time passes. At any given moment, we look back 60 seconds and count how many requests came in. If the count is under our limit, we allow the request. If not, we reject it.

This is more accurate than a fixed window (which can allow double the limit at window boundaries) and simpler to reason about than token bucket (which we’ll cover in Part 3).

The Implementation

The Struct

type RateLimiter struct {
    mu       sync.Mutex
    requests map[string][]time.Time
    limit    int
    window   time.Duration
    done     chan struct{}
}

requests maps a user ID to a slice of timestamps — every request they’ve made within the current window
limit is the maximum number of requests allowed per window
window is the duration of the sliding window
done is a channel for graceful shutdown of the background cleanup goroutine
mu protects concurrent access to the map

Creating a New Rate Limiter

func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
    rl := &RateLimiter{
        requests: make(map[string][]time.Time),
        limit:    limit,
        window:   window,
        done:     make(chan struct{}),
    }
    go rl.cleanup()
    return rl
}

Simple enough — initialize the struct and kick off the cleanup goroutine immediately.

Checking If a Request Is Allowed

func (rl *RateLimiter) IsAllowed(userID string) bool {
    rl.mu.Lock()
    defer rl.mu.Unlock()

    valid := rl.validRequests(rl.requests[userID])

    if len(valid) >= rl.limit {
        rl.requests[userID] = valid
        return false
    }

    rl.requests[userID] = append(valid, time.Now())
    return true
}

On every request we:

Grab the lock
Filter out timestamps that have expired (outside the window) — that’s the sliding part
If we’re at or over the limit, reject and return false
Otherwise, append the current timestamp and allow the request

The validRequests Helper

func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
    now := time.Now()
    threshold := now.Add(-rl.window)
    var valid []time.Time
    for _, t := range requests {
        if t.After(threshold) {
            valid = append(valid, t)
        }
    }
    return valid
}

This takes a slice of timestamps and returns only those that fall within the current window. Notice it takes the slice as a parameter rather than accessing rl.requests directly — this keeps it safe to call from both IsAllowed and cleanup without worrying about locking semantics inside the helper itself.

The Cleanup Goroutine

Without cleanup, the requests map grows forever. Even users who stopped making requests long ago will sit in memory indefinitely. The cleanup goroutine fixes that.

func (rl *RateLimiter) cleanup() {
    ticker := time.NewTicker(rl.window)
    defer ticker.Stop()

    for {
        select {
        case <-rl.done:
            return
        case <-ticker.C:
            rl.mu.Lock()
            for userID := range rl.requests {
                valid := rl.validRequests(rl.requests[userID])
                if len(valid) == 0 {
                    delete(rl.requests, userID)
                    continue
                }
                rl.requests[userID] = valid
            }
            rl.mu.Unlock()
        }
    }
}

A few things worth noting:

The ticker fires every rl.window duration — there’s no point cleaning up more frequently than the window itself
If a user’s valid request slice is empty after pruning, we delete their key entirely — otherwise the map fills up with empty slices
defer ticker.Stop() prevents a goroutine leak when the rate limiter is shut down
The done channel gives us a clean exit path

Graceful Shutdown

func (rl *RateLimiter) Stop() {
    close(rl.done)
}

Always call Stop() when you’re done with the rate limiter — in tests especially. Without it, the cleanup goroutine runs forever.

Putting It Together

rl := NewRateLimiter(10, time.Minute) // 10 requests per minute
defer rl.Stop()

if rl.IsAllowed("user-123") {
    // handle request
} else {
    // return 429 Too Many Requests
}

What This Gets Right

Correct — the sliding window is accurate, no double-limit boundary issues
Concurrent — mutex protects all shared state
Memory-bounded — cleanup goroutine keeps the map from growing forever
Clean lifecycle — goroutine exits properly via the done channel

For a single-server deployment this is genuinely production-worthy.

Honest Limitations

In-memory only — this rate limiter lives in your process’s memory. If your server restarts, all state is lost. If you run multiple instances behind a load balancer, each instance has its own independent counter — a user could make 10 requests to instance A and 10 to instance B, bypassing the limit entirely. We’ll fix this in Part 4 with Redis.

Lock contention at scale — the cleanup goroutine holds a single global lock while it sweeps the entire map. With a small number of users this is fine. But with millions of users, that sweep could take long enough to measurably block real incoming requests. In Part 2 we’ll fix this with map sharding.

Full Implementation

package ratelimiter

import (
	"sync"
	"time"
)

type RateLimiter struct {
	mu              sync.Mutex
	requests        map[string][]time.Time
	limit           int
	window          time.Duration
	done            chan struct{}
	cleanupInterval time.Duration
}

func NewRateLimiter(limit int, window time.Duration) *RateLimiter {
	rl := &RateLimiter{
		requests:        make(map[string][]time.Time),
		limit:           limit,
		window:          window,
		done:            make(chan struct{}),
		cleanupInterval: time.Second * 10,
	}
	go func() {
		rl.cleanup()
	}()
	return rl
}

func (rl *RateLimiter) validRequests(requests []time.Time) []time.Time {
	now := time.Now()
	threshold := now.Add(-rl.window)

	var valid []time.Time
	for _, t := range requests {
		if t.After(threshold) {
			valid = append(valid, t)
		}
	}
	return valid
}

func (rl *RateLimiter) cleanup() {
	ticker := time.NewTicker(rl.cleanupInterval)
	defer ticker.Stop()
	for {
		select {
		case <-rl.done:
			return
		case _ = <-ticker.C:
			rl.mu.Lock()
			for userId := range rl.requests {
				validReq := rl.validRequests(rl.requests[userId])
				if len(validReq) == 0 {
					delete(rl.requests, userId)
					continue
				}
				rl.requests[userId] = validReq
			}
			rl.mu.Unlock()
		}

	}
}

func (rl *RateLimiter) IsAllowed(userID string) bool {
	rl.mu.Lock()
	defer rl.mu.Unlock()

	var valid []time.Time = rl.validRequests(rl.requests[userID])

	if len(valid) >= rl.limit {
		rl.requests[userID] = valid
		return false
	}

	rl.requests[userID] = append(valid, time.Now())
	return true
}

func (rl *RateLimiter) Stop() {
	close(rl.done)
}

What’s Next

In Part 2 we tackle the lock contention problem head on — splitting the map into shards, each with its own mutex, so cleanup and request handling can run concurrently across shards.

Follow along to catch the rest of the series.