Why Event-Driven Microservices Matter
I’ve spent years wrestling with monolithic systems that crumbled under load. That frustration led me here—building resilient, distributed systems using event-driven patterns. Why NATS? Because it delivers the speed and simplicity we need without drowning us in complexity. Today, I’ll guide you through a production-ready implementation using Go and Docker. Stick with me—you’ll walk away with battle-tested patterns you can deploy tomorrow.
Core Infrastructure Setup
Let’s start with foundations. We organize services in isolated directories while sharing common packages:
order-service/
cmd/main.go
internal/handlers/
pkg/
events/ # Schema definitions
nats/ # Connection logic
middleware/ # Observability tools
Critical dependency: Our go.mod
includes NATS for messaging, Zap for structured logging, and Backoff for retries. Never skimp on these.
Reliable NATS Connections
Handling network failures is non-negotiable. Here’s our connection manager with exponential backoff:
// pkg/nats/connection.go
func (cm *ConnectionManager) Connect() error {
bo := backoff.NewExponentialBackOff()
bo.MaxElapsedTime = 5 * time.Minute
operation := func() error {
conn, err := nats.Connect(cm.url, cm.options...)
if err != nil {
cm.logger.Error("Connection failed - retrying")
return err
}
cm.conn = conn
return nil
}
return backoff.Retry(operation, bo) // Auto-retry with jitter
}
Why does this matter? Without backoff, transient network blips could cascade into system failures.
Event Schemas That Scale
Define contracts using versioned JSON schemas early. I learned this the hard way:
// pkg/events/order_created.json
{
"type": "object",
"properties": {
"order_id": {"type": "string", "format": "uuid"},
"items": {
"type": "array",
"items": {
"product_id": {"type": "string"},
"quantity": {"type": "integer", "minimum": 1}
}
},
"version": {"type": "string", "pattern": "^v1$"}
}
}
Question: What happens when you need to change schemas in flight? We’ll tackle that later with dual publishing.
Order Service Implementation
Processing orders requires idempotency. Notice how we:
- Validate schemas
- Generate correlation IDs
- Publish events atomically
// order-service/internal/handlers/order.go
func (h *Handler) CreateOrder(c *gin.Context) {
var order events.Order
if err := c.ShouldBindJSON(&order); err != nil {
c.JSON(400, gin.H{"error": "Invalid payload"})
return
}
order.ID = uuid.NewString() // Unique identifier
order.CorrelationID = c.GetHeader("X-Correlation-ID")
if err := h.publisher.Publish("orders.created", order); err != nil {
h.logger.Error("Event publishing failed")
c.JSON(500, gin.H{"error": "System error"})
return
}
c.JSON(202, gin.H{"status": "processing"}) // Async acceptance
}
Inventory Service - Stock Reservation
Here’s where things get interesting. We listen for orders.created
events:
// inventory-service/cmd/main.go
sub, _ := js.Subscribe("orders.created", func(msg *nats.Msg) {
var order events.Order
if err := json.Unmarshal(msg.Data, &order); err != nil {
metrics.MessageErrors.Inc()
return
}
ctx := context.WithValue(context.Background(), "correlation_id", order.CorrelationID)
if err := reserveStock(ctx, order.Items); err != nil {
if errors.Is(err, ErrInsufficientStock) {
h.publisher.Publish("orders.cancelled", order) // Compensating action
}
}
}, nats.AckWait(30*time.Second))
See the pattern? Failed reservations trigger cancellation events—critical for data consistency.
Payment Service with Circuit Breakers
Payment gateways fail. Protect them:
// payment-service/pkg/circuitbreaker/breaker.go
func (cb *CircuitBreaker) Execute(fn func() error) error {
if cb.state == StateOpen && time.Since(cb.lastFailure) < cb.timeout {
return ErrCircuitOpen // Fail fast
}
err := fn()
if err != nil {
cb.failureCount++
if cb.failureCount > cb.threshold {
cb.state = StateOpen
cb.lastFailure = time.Now()
}
return err
}
cb.reset()
return nil
}
This stops cascading failures when downstream payment processors choke.
Observability That Matters
Logs alone won’t save you at 3 AM. We combine:
- Structured logging with request tracing
- Prometheus metrics for NATS throughput
- Liveness checks for Kubernetes
// pkg/logger/logger.go
func NewLogger() *zap.Logger {
cfg := zap.NewProductionConfig()
cfg.OutputPaths = []string{"stdout", "/var/log/orders.json"}
logger, _ := cfg.Build()
return logger
}
// Health endpoint
router.GET("/health", func(c *gin.Context) {
if natsManager.IsConnected() && db.Ping() == nil {
c.Status(200)
return
}
c.Status(503)
})
Docker Optimization
Multistage builds cut image sizes by 90%:
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /order-service ./cmd
FROM gcr.io/distroless/static-debian11
COPY --from=builder /order-service /
CMD ["/order-service"]
Result: 25MB images vs. 800MB in naive builds.
Testing Event Flows
Mock NATS in integration tests:
func TestOrderFlow(t *testing.T) {
nc, _ := nats.Connect("nats://localhost:4222")
js, _ := nc.JetStream()
// Setup test stream
js.AddStream(&nats.StreamConfig{Name: "orders"})
// Publish test event
js.Publish("orders.created", testOrderPayload)
// Verify downstream effect
sub, _ := js.SubscribeSync("inventory.reserved")
msg, _ := sub.NextMsg(2 * time.Second)
if msg == nil {
t.Fatal("Inventory reservation event not published")
}
}
Production Deployment Checklist
- NATS clustering: Minimum 3 nodes for HA
- Consumer retries: Set
nats.MaxDeliver(5)
- Resource limits: Constrain container memory
- Priority queues: Use
nats.Bind()
for hot streams - Dead-letter topics: Capture poison pills
Your Turn
I’ve shared patterns refined through production fires. Now I want to hear from you: What challenges have you faced with event-driven systems? Try this implementation, then share your results—comment with your optimizations or war stories. If this saved you debugging hours, pay it forward: share this with your team.
Final thought: How might you adapt this for streaming analytics? That’s tomorrow’s adventure.