I’ve been working with microservices for years, and recently faced a major challenge: building systems that handle high throughput while staying resilient under pressure. That’s when I turned to event-driven architectures. They’ve transformed how I design distributed systems, and today I’ll show you how to build production-ready microservices using NATS, Go, and Kubernetes. This combination delivers speed, reliability, and scalability that traditional REST-based systems struggle to match.
Why choose NATS? It’s lightning-fast and supports persistent streams with JetStream. Combined with Go’s efficiency and Kubernetes’ orchestration, we create systems that scale dynamically. I’ll walk through a real order processing system I built - you’ll see concrete examples and patterns you can apply immediately.
First, our architecture needs clear event definitions. Here’s how I model domain events in Go:
package events
type Event struct {
ID string `json:"id"`
Type string `json:"type"` // e.g., "order.created"
AggregateID string `json:"aggregate_id"`
Data interface{} `json:"data"`
Timestamp time.Time `json:"timestamp"`
}
func NewEvent(eventType string, aggregateID string, data interface{}) *Event {
return &Event{
ID: uuid.New().String(),
Type: eventType,
AggregateID: aggregateID,
Data: data,
Timestamp: time.Now().UTC(),
}
}
How do we ensure these events persist across service restarts? That’s where JetStream shines. I deploy NATS on Kubernetes using a StatefulSet for durability:
# nats-cluster.yaml
apiVersion: apps/v1
kind: StatefulSet
spec:
replicas: 3
template:
spec:
containers:
- name: nats
image: nats:2.9-alpine
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Notice we’re using persistent storage claims - this prevents data loss during pod restarts. When implementing the event store, I focus on idempotency. Ever wonder how services avoid duplicate processing? Here’s the key pattern:
// In event handler
if existsInProcessedCache(event.ID) {
return // Skip duplicate
}
process(event)
storeInCache(event.ID)
For services, I follow the single responsibility principle. The inventory service handles stock reservations and releases. It listens for order.created
events and publishes inventory.reserved
or inventory.failed
. This separation allows each service to scale independently based on load.
Testing is critical. I use Go’s built-in testing with NATS embedded server:
func TestOrderCreation(t *testing.T) {
// Setup test NATS server
s := RunDefaultServer()
defer s.Shutdown()
// Create test event
event := NewEvent("order.created", "order-123", OrderData{...})
// Publish and verify
err := js.Publish(event.Type, event)
if err != nil {
t.Fatalf("Publish failed: %v", err)
}
}
In Kubernetes, I configure liveness probes that check NATS connections:
livenessProbe:
exec:
command:
- sh
- -c
- "nats rtt --json | grep -q '\"rtt\"'"
initialDelaySeconds: 10
periodSeconds: 30
For monitoring, I expose Prometheus metrics from all services. This dashboard shows event throughput and error rates:
# HELP events_processed_total Total domain events processed
# TYPE events_processed_total counter
events_processed_total{service="orders",status="success"} 12892
events_processed_total{service="orders",status="error"} 7
Production requires careful planning. I enforce schema evolution rules: new fields only, no removals. Services ignore unknown fields, maintaining backward compatibility. What happens when a service goes offline? JetStream’s persistent streams ensure no events are lost - they’ll be processed when the service recovers.
The saga pattern coordinates distributed transactions. When an order is placed, the orchestrator manages the flow: reserve inventory → process payment → ship products. If payment fails, it triggers compensation events to release inventory. This keeps our system consistent without distributed locks.
After implementing this architecture, we handled 5x more load with 30% less infrastructure. Services scale independently during peak hours, and failures are isolated. The real win? Our team deploys updates daily without downtime.
Ready to transform your microservices? Start small with one event stream and expand gradually. If you found this useful, share it with your team and leave a comment about your experience! What challenges have you faced with distributed systems?