Build Production-Ready Event-Driven Go Microservices with NATS JetStream and OpenTelemetry Complete Guide

golang

Build Production-Ready Event-Driven Go Microservices with NATS JetStream and OpenTelemetry Complete Guide

Learn to build production-ready event-driven microservices using Go, NATS JetStream, and OpenTelemetry. Master messaging, observability, and scalable architecture patterns.

Nov 3, 2025

Build Production-Ready Event-Driven Go Microservices with NATS JetStream and OpenTelemetry Complete Guide

I’ve been thinking about this topic for weeks. In my work with distributed systems, I’ve seen too many teams struggle with moving event-driven architectures from development to production. The gap between a working prototype and a reliable, observable system is wider than most developers realize. Today, I want to share a practical approach to building production-ready event-driven microservices using Go, NATS JetStream, and OpenTelemetry.

Why do so many event-driven systems fail in production? Often, it’s not the core functionality but the operational aspects that get overlooked. Let me show you how to build systems that not only work but thrive under real-world conditions.

Let’s start with the foundation. In Go, we structure our services around clear domain boundaries. Each microservice owns its data and exposes well-defined APIs while communicating through events. Here’s how I typically structure the event system:

type OrderEvent struct {
    ID          string    `json:"id"`
    Type        string    `json:"type"`
    OrderID     string    `json:"order_id"`
    CustomerID  string    `json:"customer_id"`
    Amount      float64   `json:"amount"`
    Timestamp   time.Time `json:"timestamp"`
    TraceID     string    `json:"trace_id"`
}

func (e *OrderEvent) Validate() error {
    if e.OrderID == "" {
        return errors.New("order ID is required")
    }
    if e.Amount <= 0 {
        return errors.New("amount must be positive")
    }
    return nil
}

Notice the TraceID field? That’s our first step toward observability. But how do we ensure these events are processed reliably?

NATS JetStream provides the persistence and guarantees we need. Here’s how I set up a robust JetStream connection:

func setupJetStream(urls []string) (nats.JetStreamContext, error) {
    opts := []nats.Option{
        nats.Name("order-service"),
        nats.MaxReconnects(5),
        nats.ReconnectWait(2 * time.Second),
        nats.DisconnectErrHandler(func(nc *nats.Conn, err error) {
            log.Printf("Disconnected from NATS: %v", err)
        }),
    }
    
    nc, err := nats.Connect(strings.Join(urls, ","), opts...)
    if err != nil {
        return nil, fmt.Errorf("failed to connect: %w", err)
    }
    
    js, err := nc.JetStream()
    if err != nil {
        return nil, fmt.Errorf("failed to get JetStream context: %w", err)
    }
    
    // Create stream if it doesn't exist
    _, err = js.AddStream(&nats.StreamConfig{
        Name:     "ORDERS",
        Subjects: []string{"orders.>"},
        Retention: nats.WorkQueuePolicy,
    })
    
    return js, err
}

What happens when services need to scale? Concurrency patterns in Go become crucial. I prefer using worker pools for processing events:

type WorkerPool struct {
    workers   int
    jobQueue  chan nats.Msg
    js        nats.JetStreamContext
}

func (wp *WorkerPool) Start() {
    for i := 0; i < wp.workers; i++ {
        go wp.worker(i)
    }
}

func (wp *WorkerPool) worker(id int) {
    for msg := range wp.jobQueue {
        ctx := context.Background()
        
        // Extract tracing context from message headers
        if traceID := msg.Header.Get("trace_id"); traceID != "" {
            ctx = tracing.ContextWithTraceID(ctx, traceID)
        }
        
        if err := wp.processMessage(ctx, msg); err != nil {
            log.Printf("Worker %d failed to process message: %v", id, err)
            // Implement retry logic here
        }
        
        msg.Ack()
    }
}

But how do we know what’s happening across all these concurrent operations? This is where OpenTelemetry transforms our ability to understand system behavior:

func initTracing(serviceName string) (func(), error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint(os.Getenv("JAEGER_ENDPOINT")),
    ))
    if err != nil {
        return nil, err
    }
    
    tp := tracesdk.NewTracerProvider(
        tracesdk.WithBatcher(exporter),
        tracesdk.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String(serviceName),
        )),
    )
    
    otel.SetTracerProvider(tp)
    
    return func() {
        ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
        defer cancel()
        tp.Shutdown(ctx)
    }, nil
}

func processOrder(ctx context.Context, order Order) error {
    ctx, span := tracer.Start(ctx, "processOrder")
    defer span.End()
    
    span.SetAttributes(
        attribute.String("order.id", order.ID),
        attribute.Float64("order.amount", order.Amount),
    )
    
    // Business logic here
    return nil
}

Error handling in distributed systems requires careful consideration. I’ve found circuit breakers essential for preventing cascading failures:

func setupPaymentClient() *PaymentClient {
    cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
        Name:        "payment-service",
        Timeout:     30 * time.Second,
        ReadyToTrip: func(counts gobreaker.Counts) bool {
            return counts.ConsecutiveFailures > 5
        },
    })
    
    return &PaymentClient{circuitBreaker: cb}
}

func (pc *PaymentClient) ProcessPayment(ctx context.Context, payment Payment) error {
    _, err := pc.circuitBreaker.Execute(func() (interface{}, error) {
        return nil, pc.processPayment(ctx, payment)
    })
    return err
}

Testing event-driven systems presents unique challenges. Here’s how I approach integration testing:

func TestOrderWorkflow(t *testing.T) {
    // Setup
    js := setupTestJetStream(t)
    defer cleanupTestStream(js)
    
    // Create test dependencies
    orderSvc := NewOrderService(js)
    paymentSvc := NewPaymentService(js)
    
    // Execute
    order := Order{ID: "test-order", Amount: 100.0}
    err := orderSvc.CreateOrder(context.Background(), order)
    require.NoError(t, err)
    
    // Verify events were published and processed
    msg, err := js.GetMsg("ORDERS", "orders.created.test-order")
    require.NoError(t, err)
    
    var event OrderEvent
    require.NoError(t, json.Unmarshal(msg.Data, &event))
    assert.Equal(t, order.ID, event.OrderID)
}

Deployment considerations are just as important as the code. Here’s a sample Dockerfile that incorporates best practices:

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main ./cmd/order-service

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
COPY --from=builder /app/configs ./configs
EXPOSE 8080
CMD ["./main"]

Monitoring in production requires both technical and business metrics. I instrument key business processes:

func recordOrderMetrics(order Order) {
    orderCounter.Add(ctx, 1, metric.WithAttributes(
        attribute.String("status", "created"),
        attribute.String("currency", order.Currency),
    ))
    
    orderValueRecorder.Record(ctx, order.Amount, metric.WithAttributes(
        attribute.String("currency", order.Currency),
    ))
}

Building production-ready event-driven systems requires thinking beyond basic functionality. It’s about creating systems that are observable, resilient, and maintainable. The combination of Go’s concurrency model, NATS JetStream’s reliability, and OpenTelemetry’s observability gives us a powerful foundation.

What challenges have you faced with event-driven architectures? I’d love to hear about your experiences and solutions. If you found this helpful, please share it with your team and leave a comment about what topics you’d like me to cover next.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Build Production-Ready Event-Driven Go Microservices with NATS JetStream and OpenTelemetry Complete Guide

Our Creations

We are on Medium

Similar Posts

Building Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

Build Distributed Event-Driven Microservices with Go, NATS, and MongoDB: Complete Production Guide

Complete Event-Driven Microservices in Go: NATS, gRPC, and Advanced Patterns Tutorial

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Complete Guide

Fiber Redis Integration Guide: Build Lightning-Fast Web Apps with Caching and Sessions

Building Smarter APIs with Gin and Dgraph: A Graph-Based Approach to Complex Data