Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Guide

golang

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Guide

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with production patterns, observability & deployment.

Sep 25, 2025

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Guide

I’ve been thinking about event-driven architectures a lot lately. In my work building distributed systems, I’ve seen how traditional request-response patterns struggle with scalability and resilience. Event-driven microservices offer a compelling alternative, but implementing them in production requires careful consideration of messaging, observability, and error handling. That’s why I want to share a practical approach using Go, NATS JetStream, and OpenTelemetry.

What makes this combination so powerful? Go’s concurrency model, NATS JetStream’s persistence guarantees, and OpenTelemetry’s observability create a robust foundation. But how do we ensure these services remain reliable under load?

Let me show you how I structure event-driven services. The key is designing events that capture meaningful domain changes. Each event should represent a business fact that occurred.

type OrderCreated struct {
    EventID     string    `json:"event_id"`
    OrderID     string    `json:"order_id"`
    CustomerID  string    `json:"customer_id"`
    Items       []Item    `json:"items"`
    CreatedAt   time.Time `json:"created_at"`
}

func NewOrderCreated(orderID, customerID string, items []Item) *OrderCreated {
    return &OrderCreated{
        EventID:    uuid.New().String(),
        OrderID:    orderID,
        CustomerID: customerID,
        Items:      items,
        CreatedAt:  time.Now().UTC(),
    }
}

Setting up NATS JetStream requires thoughtful configuration. I prefer creating streams with explicit retention policies and replication. This ensures messages survive restarts and network partitions.

func CreateOrderStream(ctx context.Context, js jetstream.JetStream) error {
    cfg := jetstream.StreamConfig{
        Name:         "ORDERS",
        Subjects:     []string{"orders.>"},
        Retention:    jetstream.WorkQueuePolicy,
        MaxMsgs:      1000000,
        MaxAge:       24 * time.Hour,
        Replicas:     3,
        Duplicates:   2 * time.Minute,
    }
    
    _, err := js.CreateStream(ctx, cfg)
    return err
}

Have you considered how consumers should handle backpressure? I implement pull-based consumers with explicit acknowledgment. This prevents overwhelming services during traffic spikes.

func ProcessOrders(ctx context.Context, js jetstream.JetStream) error {
    cons, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
        Durable:   "order-processor",
        AckPolicy: jetstream.AckExplicitPolicy,
    })
    
    messages, err := cons.Messages()
    for msg := range messages {
        if err := handleOrder(msg); err != nil {
            msg.Nak() // Negative acknowledgment for retry
        } else {
            msg.Ack()
        }
    }
    return nil
}

OpenTelemetry transforms how we understand distributed systems. By instrumenting both producers and consumers, we gain visibility into event flows and performance bottlenecks.

func PublishOrderEvent(ctx context.Context, js jetstream.JetStream, event *OrderCreated) error {
    tracer := otel.Tracer("order-service")
    ctx, span := tracer.Start(ctx, "publish.order")
    defer span.End()
    
    data, err := json.Marshal(event)
    if err != nil {
        span.RecordError(err)
        return err
    }
    
    _, err = js.Publish(ctx, "orders.created", data)
    span.SetAttributes(attribute.String("order.id", event.OrderID))
    return err
}

What happens when services need to scale independently? I use consumer groups with proper partitioning. Each service instance processes a subset of messages while maintaining ordering guarantees.

Error handling deserves special attention. I implement circuit breakers and dead-letter queues for problematic messages. This prevents cascading failures and allows manual inspection of failed events.

type EventProcessor struct {
    breaker    *gobreaker.CircuitBreaker
    dlq        jetstream.JetStream
    maxRetries int
}

func (p *EventProcessor) ProcessWithRetry(ctx context.Context, msg jetstream.Msg) error {
    for i := 0; i < p.maxRetries; i++ {
        result, err := p.breaker.Execute(func() (interface{}, error) {
            return nil, p.processMessage(ctx, msg)
        })
        
        if err == nil {
            return msg.Ack()
        }
        
        if i == p.maxRetries-1 {
            return p.sendToDLQ(ctx, msg)
        }
        
        time.Sleep(time.Second * time.Duration(math.Pow(2, float64(i))))
    }
    return nil
}

Testing event-driven systems requires simulating real-world conditions. I focus on integration tests that verify event flows and recovery scenarios. Mocking NATS JetStream helps isolate components while maintaining test reliability.

Deployment considerations include health checks and graceful shutdown. Services should stop accepting new events while completing in-progress work.

func (s *Service) Shutdown(ctx context.Context) error {
    s.health.SetServingStatus(false)
    
    // Stop accepting new events
    close(s.done)
    
    // Wait for current processing to complete
    select {
    case <-s.processorsStopped:
        return nil
    case <-ctx.Done():
        return ctx.Err()
    }
}

Monitoring production services involves tracking key metrics: event throughput, processing latency, and error rates. I combine OpenTelemetry metrics with structured logging to create comprehensive dashboards.

How do we ensure data consistency across services? I use idempotent consumers and version checks. This prevents duplicate processing and maintains data integrity.

The journey to production-ready event-driven microservices involves iterative refinement. Start with simple event flows, then add complexity as needed. Remember that observability and resilience are not afterthoughts—they’re fundamental to success.

I hope this perspective helps you build more robust systems. What challenges have you faced with event-driven architectures? Share your experiences in the comments below—I’d love to continue this conversation. If you found this useful, please like and share with your team.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Guide

Our Creations

We are on Medium

Similar Posts

Cobra Viper Integration Guide: Build Advanced CLI Apps with Go Configuration Management

Master Event-Driven Microservices with NATS, Go, Kubernetes: Complete Production Implementation Guide

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Production-Ready Event-Driven Microservices: NATS, Go, and Distributed Tracing Implementation Guide

Build Production-Ready Event-Driven Microservices with NATS, Go, and Kubernetes: Complete Tutorial

Building Production-Ready Event-Driven Microservices with Go NATS JetStream and OpenTelemetry Guide