Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

golang

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with Docker, Kubernetes deployment & monitoring.

Jul 31, 2025

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been wrestling with distributed systems for years. Recently, while debugging a cascading failure in our payment processing flow, I realized how crucial proper event-driven architecture is. That’s why I want to share a battle-tested approach using Go, NATS JetStream, and OpenTelemetry. This isn’t just theory – it’s what I wish I knew before that midnight outage call.

Let’s start with our foundation. We organize code in a mono-repo with clear boundaries. The cmd directory houses service entry points, while internal contains domain logic. Shared utilities like messaging and telemetry live in internal/shared. This structure prevents circular dependencies while allowing code reuse. Our go.mod includes critical dependencies: NATS for messaging, OpenTelemetry for observability, and gobreaker for resilience patterns.

When configuring NATS JetStream, security comes first. We isolate services using separate credentials and configure streams for persistence. Notice how the stream configuration specifies retention policies and replication:

// internal/shared/messaging/jetstream.go
streamConfig := &nats.StreamConfig{
    Name:      "ORDERS",
    Subjects:  []string{"orders.*"},
    Retention: nats.LimitsPolicy,  // Keep messages until limits hit
    Storage:   nats.FileStorage,   // Persist to disk
    Replicas:  3,                  // For high availability
    MaxAge:    time.Hour * 24 * 7, // Keep for 1 week
}

Why does stream configuration matter? Because it determines how your system behaves under load. Get this wrong, and you’ll lose critical events during peak traffic.

Our order service demonstrates core patterns. It publishes events using a simple wrapper:

// internal/order/service.go
func (s *OrderService) CreateOrder(ctx context.Context, order Order) error {
    event := OrderCreatedEvent{
        OrderID: order.ID,
        Amount:  order.Total,
    }
    if err := s.jetstream.Publish(ctx, "orders.created", event); err != nil {
        s.logger.Error("Failed publishing order", zap.Error(err))
        return fmt.Errorf("event publishing failed: %w", err)
    }
    return nil
}

For payments, we implement the circuit breaker pattern using Sony’s gobreaker. This prevents cascading failures when downstream services struggle:

// internal/payment/service.go
breaker := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:     "PaymentProcessor",
    Timeout:  5 * time.Second,
    ReadyToTrip: func(counts gobreaker.Counts) bool {
        return counts.ConsecutiveFailures > 5
    },
})

result, err := breaker.Execute(func() (interface{}, error) {
    return s.processPayment(ctx, payment)
})

How many times have you seen a minor glitch take down entire systems? Circuit breakers give your services breathing room to recover.

Inventory management uses event sourcing. Instead of updating state directly, we append events to a stream. This journal becomes our source of truth:

// internal/inventory/service.go
func (s *InventoryService) ReserveStock(ctx context.Context, orderID uuid.UUID, items []Item) error {
    event := StockReservedEvent{OrderID: orderID, Items: items}
    return s.eventStore.Append(ctx, event)
}

Observability ties everything together. We instrument services using OpenTelemetry’s unified API:

// internal/shared/telemetry/setup.go
func InitTracer(serviceName string) (func(context.Context) error, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        ),
    )
    otel.SetTracerProvider(tp)
    return tp.Shutdown, nil
}

Notice how we propagate trace context across services. When an order event flows to payment processing, the trace ID remains consistent. Ever tried debugging a distributed transaction without proper tracing? It’s like finding a needle in a haystack.

For event processing, we leverage Go’s concurrency primitives. Worker pools handle events efficiently without overwhelming systems:

// internal/shared/patterns/workers.go
func StartWorkerPool(ctx context.Context, numWorkers int, handler func(msg *nats.Msg)) {
    for i := 0; i < numWorkers; i++ {
        go func(workerID int) {
            for {
                select {
                case msg := <-workQueue:
                    handler(msg)
                case <-ctx.Done():
                    return
                }
            }
        }(i)
    }
}

Testing event-driven systems requires special attention. We use Testcontainers to spin up real infrastructure:

// internal/inventory/service_test.go
func TestStockReservation(t *testing.T) {
    ctx := context.Background()
    natsContainer, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        Started: true,
        ContainerRequest: testcontainers.ContainerRequest{
            Image: "nats:2.9",
            ExposedPorts: []string{"4222/tcp"},
        },
    })
    // ... test logic using real NATS instance
}

Deployment uses Docker and Kubernetes. Our Dockerfiles follow best practices:

# cmd/order-service/Dockerfile
FROM golang:1.21 as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o order-service ./cmd/order-service

FROM alpine:latest
COPY --from=builder /app/order-service /usr/local/bin/
CMD ["order-service"]

In Kubernetes, we configure readiness probes that actually mean something:

# deployments/k8s/order-service.yaml
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  successThreshold: 3

Monitoring relies on Prometheus metrics exposed through OpenTelemetry:

// internal/shared/telemetry/metrics.go
func InitMeter(serviceName string) *metric.MeterProvider {
    exporter, _ := prometheus.New()
    provider := metric.NewMeterProvider(
        metric.WithReader(exporter),
        metric.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        ),
    )
    return provider
}

Common pitfalls? I’ve stepped in most of them. One major lesson: Always set message acknowledgements correctly in JetStream. Use AckExplicit() when processing order matters, and AckAll() for throughput. Another gotcha: Context propagation across async boundaries. We solve this by embedding trace context in event metadata.

What separates production-ready from “it works on my machine”? Resilience at every layer. Timeouts. Retries with backoff. Circuit breakers. Proper observability. Without these, you’re flying blind during incidents.

This approach has handled Black Friday traffic spikes without breaking a sweat. The real test? When our payment processor had an outage, the system degraded gracefully rather than collapsing. That’s the power of proper event-driven architecture.

If you found this useful, share it with your team. Have questions or war stories? Let’s discuss in the comments – I’ll respond to every one.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Our Creations

We are on Medium

Similar Posts

Production-Ready Event-Driven Microservices: NATS, Go, and Observability Implementation Guide

Production-Ready gRPC Services in Go: JWT Auth, Metrics, Circuit Breakers Tutorial

Master Production-Ready Go Microservices: gRPC, Protocol Buffers, Service Discovery Complete Guide

Build Production-Ready Event-Driven Microservices: Go, NATS, Observability Complete Tutorial

Go CLI Development: Mastering Cobra and Viper Integration for Professional Configuration Management

Build Professional CLI Tools: Mastering Cobra-Viper Integration for Advanced Configuration Management in Go