How to Build a Production-Ready Token Bucket Rate Limiter in Go with Redis and HTTP Middleware

golang

How to Build a Production-Ready Token Bucket Rate Limiter in Go with Redis and HTTP Middleware

Learn to build a production-ready rate limiter in Go using token bucket algorithm with Redis, middleware integration, and comprehensive testing strategies.

Dec 4, 2025

How to Build a Production-Ready Token Bucket Rate Limiter in Go with Redis and HTTP Middleware

Have you ever watched a service buckle under unexpected traffic? I have, and it’s not pretty. That moment when your API starts slowing down or, worse, crashes because too many requests hit it at once—it’s a wake-up call. That’s why I spent time figuring out how to build a solid rate limiter in Go. It’s not just about stopping abuse; it’s about ensuring fairness, controlling costs, and keeping your systems reliable. If you’re building anything that others depend on, this is a skill you need. Let’s get into how to make it production-ready.

Why start with the token bucket algorithm? Think of it like a literal bucket. It holds tokens, say up to 10, representing requests you can handle immediately. New tokens drip in steadily, say 5 per second, refilling the bucket. When a request comes, it takes a token. If tokens are available, the request goes through. If not, it waits or gets rejected. This approach is brilliant because it allows short bursts of traffic—mimicking real-world usage—while smoothing out the flow over time. Isn’t it fascinating how a simple concept can prevent so many headaches?

I wanted something that works fast and handles multiple users at the same time. Go’s concurrency features are perfect for this. Let’s look at the heart of the implementation. We need a struct to manage the bucket state safely.

type TokenBucket struct {
    capacity   float64
    tokens     float64
    refillRate float64
    lastRefill time.Time
    mu         sync.Mutex
}

Here, capacity is the max tokens for bursts, tokens is the current count, and refillRate sets how quickly we add tokens. The mu mutex is key—it prevents race conditions when many goroutines try to access the bucket simultaneously. Why is that important? Without it, you could end up with inconsistent token counts, leading to too many or too few requests being allowed.

To create a new bucket, we initialize it with a full set of tokens. Then, for each request, we check if tokens are available. But how do we refill based on time passed? We calculate the time since the last check and add tokens proportionally. Here’s a simplified method:

func (tb *TokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()
    
    now := time.Now()
    elapsed := now.Sub(tb.lastRefill).Seconds()
    tb.tokens += elapsed * tb.refillRate
    if tb.tokens > tb.capacity {
        tb.tokens = tb.capacity
    }
    tb.lastRefill = now
    
    if tb.tokens >= 1 {
        tb.tokens--
        return true
    }
    return false
}

This code locks the mutex, updates tokens, and then decides if a request can proceed. It’s straightforward but powerful. Notice how we cap the tokens at capacity to avoid overflow. This ensures bursts are controlled.

Now, what about integrating this into a web service? That’s where middleware comes in. In Go, you can wrap your HTTP handlers to apply rate limiting per user or IP. Imagine you have an API endpoint; you want to limit each client to 100 requests per minute. Here’s a basic example using the standard library:

func RateLimitMiddleware(next http.Handler, limiter *TokenBucket) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        clientIP := r.RemoteAddr
        // Use clientIP as a key for the limiter
        if limiter.Allow() {
            next.ServeHTTP(w, r)
        } else {
            http.Error(w, "Too many requests", http.StatusTooManyRequests)
        }
    })
}

This middleware checks each request against the token bucket. If allowed, it proceeds; if not, it returns a 429 status code. But here’s a thought: what if your service runs on multiple servers? A single in-memory limiter won’t work across instances.

That’s when you need distributed rate limiting. Redis is a popular choice because it’s fast and shared. Instead of storing tokens in local memory, we keep them in Redis. This way, all server instances see the same count. The implementation changes slightly—we use Redis commands to manage token counts atomically. For example, you might use a Lua script to ensure operations are atomic and fast.

But let’s not forget testing. How do you know your rate limiter works correctly under load? I write benchmarks to simulate high traffic. Go’s testing package is excellent for this. You can spawn many goroutines that hit the limiter and verify it behaves as expected. It’s crucial to catch edge cases, like what happens when requests come in at the exact same time.

In production, I add monitoring. I expose metrics like the number of allowed vs. denied requests, which helps in tuning the rate limits. Tools like Prometheus can scrape these metrics, giving you insights into traffic patterns. Have you considered how you’d alert on sudden spikes? Setting up alarms for high denial rates can warn you of potential attacks or misconfigurations.

Building this taught me a lot about balance. Set limits too tight, and you frustrate users; too loose, and you risk your system. The token bucket algorithm offers a flexible middle ground. With Go’s simplicity and performance, you can implement it efficiently. I encourage you to try it out, tweak the parameters, and see how it fits your needs.

If you found this helpful, please share it with others who might benefit. Leave a comment with your experiences or questions—I’d love to hear how you’ve tackled rate limiting in your projects. Let’s keep building resilient systems together.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

How to Build a Production-Ready Token Bucket Rate Limiter in Go with Redis and HTTP Middleware

Our Creations

We are on Medium

Similar Posts

How to Build a High-Performance GraphQL API in Go with gqlgen and DataLoader

Build Production-Ready gRPC Microservices: Authentication, Observability, and Graceful Shutdown in Go

How to Integrate Chi Router with OpenTelemetry for Observable Go Web Services and Distributed Tracing

Boost Web App Performance: Fiber + Redis Integration Guide for Lightning-Fast Go Applications

Echo and Redis Integration Guide: Build High-Performance Go APIs with go-redis Caching and Sessions

How to Integrate Cobra CLI with Viper Configuration Management for Powerful Go Applications