golang

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with Docker, Kubernetes deployment & monitoring.

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been wrestling with distributed systems for years. Recently, while debugging a cascading failure in our payment processing flow, I realized how crucial proper event-driven architecture is. That’s why I want to share a battle-tested approach using Go, NATS JetStream, and OpenTelemetry. This isn’t just theory – it’s what I wish I knew before that midnight outage call.

Let’s start with our foundation. We organize code in a mono-repo with clear boundaries. The cmd directory houses service entry points, while internal contains domain logic. Shared utilities like messaging and telemetry live in internal/shared. This structure prevents circular dependencies while allowing code reuse. Our go.mod includes critical dependencies: NATS for messaging, OpenTelemetry for observability, and gobreaker for resilience patterns.

When configuring NATS JetStream, security comes first. We isolate services using separate credentials and configure streams for persistence. Notice how the stream configuration specifies retention policies and replication:

// internal/shared/messaging/jetstream.go
streamConfig := &nats.StreamConfig{
    Name:      "ORDERS",
    Subjects:  []string{"orders.*"},
    Retention: nats.LimitsPolicy,  // Keep messages until limits hit
    Storage:   nats.FileStorage,   // Persist to disk
    Replicas:  3,                  // For high availability
    MaxAge:    time.Hour * 24 * 7, // Keep for 1 week
}

Why does stream configuration matter? Because it determines how your system behaves under load. Get this wrong, and you’ll lose critical events during peak traffic.

Our order service demonstrates core patterns. It publishes events using a simple wrapper:

// internal/order/service.go
func (s *OrderService) CreateOrder(ctx context.Context, order Order) error {
    event := OrderCreatedEvent{
        OrderID: order.ID,
        Amount:  order.Total,
    }
    if err := s.jetstream.Publish(ctx, "orders.created", event); err != nil {
        s.logger.Error("Failed publishing order", zap.Error(err))
        return fmt.Errorf("event publishing failed: %w", err)
    }
    return nil
}

For payments, we implement the circuit breaker pattern using Sony’s gobreaker. This prevents cascading failures when downstream services struggle:

// internal/payment/service.go
breaker := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:     "PaymentProcessor",
    Timeout:  5 * time.Second,
    ReadyToTrip: func(counts gobreaker.Counts) bool {
        return counts.ConsecutiveFailures > 5
    },
})

result, err := breaker.Execute(func() (interface{}, error) {
    return s.processPayment(ctx, payment)
})

How many times have you seen a minor glitch take down entire systems? Circuit breakers give your services breathing room to recover.

Inventory management uses event sourcing. Instead of updating state directly, we append events to a stream. This journal becomes our source of truth:

// internal/inventory/service.go
func (s *InventoryService) ReserveStock(ctx context.Context, orderID uuid.UUID, items []Item) error {
    event := StockReservedEvent{OrderID: orderID, Items: items}
    return s.eventStore.Append(ctx, event)
}

Observability ties everything together. We instrument services using OpenTelemetry’s unified API:

// internal/shared/telemetry/setup.go
func InitTracer(serviceName string) (func(context.Context) error, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        ),
    )
    otel.SetTracerProvider(tp)
    return tp.Shutdown, nil
}

Notice how we propagate trace context across services. When an order event flows to payment processing, the trace ID remains consistent. Ever tried debugging a distributed transaction without proper tracing? It’s like finding a needle in a haystack.

For event processing, we leverage Go’s concurrency primitives. Worker pools handle events efficiently without overwhelming systems:

// internal/shared/patterns/workers.go
func StartWorkerPool(ctx context.Context, numWorkers int, handler func(msg *nats.Msg)) {
    for i := 0; i < numWorkers; i++ {
        go func(workerID int) {
            for {
                select {
                case msg := <-workQueue:
                    handler(msg)
                case <-ctx.Done():
                    return
                }
            }
        }(i)
    }
}

Testing event-driven systems requires special attention. We use Testcontainers to spin up real infrastructure:

// internal/inventory/service_test.go
func TestStockReservation(t *testing.T) {
    ctx := context.Background()
    natsContainer, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        Started: true,
        ContainerRequest: testcontainers.ContainerRequest{
            Image: "nats:2.9",
            ExposedPorts: []string{"4222/tcp"},
        },
    })
    // ... test logic using real NATS instance
}

Deployment uses Docker and Kubernetes. Our Dockerfiles follow best practices:

# cmd/order-service/Dockerfile
FROM golang:1.21 as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o order-service ./cmd/order-service

FROM alpine:latest
COPY --from=builder /app/order-service /usr/local/bin/
CMD ["order-service"]

In Kubernetes, we configure readiness probes that actually mean something:

# deployments/k8s/order-service.yaml
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  successThreshold: 3

Monitoring relies on Prometheus metrics exposed through OpenTelemetry:

// internal/shared/telemetry/metrics.go
func InitMeter(serviceName string) *metric.MeterProvider {
    exporter, _ := prometheus.New()
    provider := metric.NewMeterProvider(
        metric.WithReader(exporter),
        metric.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        ),
    )
    return provider
}

Common pitfalls? I’ve stepped in most of them. One major lesson: Always set message acknowledgements correctly in JetStream. Use AckExplicit() when processing order matters, and AckAll() for throughput. Another gotcha: Context propagation across async boundaries. We solve this by embedding trace context in event metadata.

What separates production-ready from “it works on my machine”? Resilience at every layer. Timeouts. Retries with backoff. Circuit breakers. Proper observability. Without these, you’re flying blind during incidents.

This approach has handled Black Friday traffic spikes without breaking a sweat. The real test? When our payment processor had an outage, the system degraded gracefully rather than collapsing. That’s the power of proper event-driven architecture.

If you found this useful, share it with your team. Have questions or war stories? Let’s discuss in the comments – I’ll respond to every one.

Keywords: event-driven microservices Go, NATS JetStream tutorial, OpenTelemetry Go microservices, production microservices architecture, Go concurrency patterns, microservices observability tracing, Kubernetes microservices deployment, event sourcing Go implementation, circuit breaker microservices, distributed systems Go programming



Similar Posts
Blog Image
Complete Guide: Building Production-Ready Event-Driven Microservices with NATS, Go, and Distributed Tracing

Learn to build production-ready microservices with NATS messaging, Go concurrency patterns, and OpenTelemetry tracing. Master event-driven architecture today!

Blog Image
Build Event-Driven Microservices with Go: NATS JetStream and OpenTelemetry Production Guide

Learn to build robust event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete tutorial with Docker, Kubernetes deployment & resilience patterns.

Blog Image
Apache Kafka with Go: Production-Ready Event Streaming, Consumer Groups, Schema Registry and Performance Optimization Guide

Learn to build production-ready Kafka streaming apps with Go. Master Sarama client, consumer groups, Schema Registry, and performance optimization. Complete guide with examples.

Blog Image
Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Complete Guide

Learn to build production-ready event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with resilience patterns, tracing & deployment.

Blog Image
Cobra Viper Integration Guide: Build Advanced Go CLI Tools with Multi-Source Configuration Management

Learn to integrate Cobra with Viper for powerful Go CLI apps with flexible config management from files, env vars & flags. Build better DevOps tools today!

Blog Image
Boost Web App Performance: Echo Framework + Redis Integration Guide for Go Developers

Boost web app performance by integrating Echo Go framework with Redis for fast caching, session management, and real-time data. Learn implementation tips now.