Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build production-ready event-driven microservices with Go, NATS JetStream & OpenTelemetry. Master concurrency patterns, observability, and resilient message processing.

Sep 17, 2025

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been thinking a lot about how modern systems handle scale and complexity lately. It’s fascinating how event-driven architectures have become essential for building resilient, scalable applications. Today, I want to share how we can build a production-ready system using Go, NATS JetStream, and OpenTelemetry - three technologies that work beautifully together.

Why focus on these tools? Go’s simplicity and performance make it ideal for microservices. NATS JetStream provides reliable messaging with persistence, while OpenTelemetry gives us the observability we need in distributed systems. Combined, they create a foundation that can handle real production workloads.

Let me show you how we can implement a robust order processing system. We’ll start by defining our event structure - this is crucial because it determines how our services communicate. Have you ever considered what makes an event definition production-ready?

type BaseEvent struct {
    ID        string            `json:"id"`
    Type      EventType         `json:"type"`
    Timestamp time.Time         `json:"timestamp"`
    Version   string            `json:"version"`
    Source    string            `json:"source"`
    TraceID   string            `json:"trace_id,omitempty"`
}

Notice how we include tracing information right in our event structure. This allows us to follow a request across service boundaries, which is essential for debugging distributed systems.

Configuration management is another critical aspect. We use Viper to handle configuration from multiple sources:

func Load(serviceName string) (*Config, error) {
    v := viper.New()
    v.SetConfigName("config")
    v.AddConfigPath("./configs")
    v.AutomaticEnv()
    v.SetEnvKeyReplacer(strings.NewReplacer(".", "_"))
    
    if err := v.ReadInConfig(); err != nil {
        return nil, fmt.Errorf("failed to read config: %w", err)
    }
    
    var cfg Config
    if err := v.Unmarshal(&cfg); err != nil {
        return nil, fmt.Errorf("failed to unmarshal config: %w", err)
    }
    
    return &cfg, nil
}

This approach gives us flexibility - we can use config files for development and environment variables for production deployments.

When it comes to connecting to NATS JetStream, we need to handle connection failures gracefully. What happens if our message broker becomes temporarily unavailable?

func ConnectJetStream(cfg config.NATSConfig) (nats.JetStreamContext, error) {
    opts := []nats.Option{
        nats.MaxReconnects(cfg.MaxReconnects),
        nats.ReconnectWait(parseDuration(cfg.ReconnectWait)),
        nats.DisconnectErrHandler(func(nc *nats.Conn, err error) {
            log.Warn("NATS connection disconnected", zap.Error(err))
        }),
        nats.ReconnectHandler(func(nc *nats.Conn) {
            log.Info("NATS connection reestablished")
        }),
    }
    
    nc, err := nats.Connect(strings.Join(cfg.URLs, ","), opts...)
    if err != nil {
        return nil, fmt.Errorf("failed to connect to NATS: %w", err)
    }
    
    return nc.JetStream()
}

The retry logic and event handlers ensure our service can survive temporary network issues without losing messages.

Observability is where OpenTelemetry shines. Let me show you how we instrument our HTTP handlers:

func NewOrderHandler(js nats.JetStreamContext, tracer trace.Tracer) *OrderHandler {
    return &OrderHandler{
        js:     js,
        tracer: tracer,
    }
}

func (h *OrderHandler) CreateOrder(c *gin.Context) {
    ctx, span := h.tracer.Start(c.Request.Context(), "CreateOrder")
    defer span.End()
    
    var order Order
    if err := c.ShouldBindJSON(&order); err != nil {
        span.RecordError(err)
        c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
        return
    }
    
    event := events.NewOrderCreatedEvent("order-service", order)
    event.TraceID = span.SpanContext().TraceID().String()
    event.SpanID = span.SpanContext().SpanID().String()
    
    eventData, err := event.Marshal()
    if err != nil {
        span.RecordError(err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to create event"})
        return
    }
    
    if _, err := h.js.Publish("orders.created", eventData); err != nil {
        span.RecordError(err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to publish event"})
        return
    }
    
    c.JSON(http.StatusAccepted, gin.H{"message": "Order processing started"})
}

This instrumentation gives us complete visibility into our request flow, from HTTP handling through event publishing.

For message consumers, we implement worker pools to handle concurrent processing:

func StartInventoryWorker(js nats.JetStreamContext, tracer trace.Tracer, numWorkers int) {
    for i := 0; i < numWorkers; i++ {
        go func(workerID int) {
            sub, err := js.PullSubscribe("orders.created", "inventory-workers")
            if err != nil {
                log.Error("failed to create subscription", zap.Error(err))
                return
            }
            
            for {
                msgs, err := sub.Fetch(1, nats.MaxWait(30*time.Second))
                if err != nil {
                    if err != nats.ErrTimeout {
                        log.Error("failed to fetch message", zap.Error(err))
                    }
                    continue
                }
                
                for _, msg := range msgs {
                    processInventoryMessage(msg, tracer, workerID)
                }
            }
        }(i)
    }
}

This pattern allows us to scale our processing capacity by simply increasing the number of workers.

What about error handling and dead letter queues? We need to ensure that problematic messages don’t block our entire system:

func processInventoryMessage(msg *nats.Msg, tracer trace.Tracer, workerID int) {
    ctx, span := tracer.Start(context.Background(), "ProcessInventoryMessage")
    defer span.End()
    
    var event events.OrderCreatedEvent
    if err := json.Unmarshal(msg.Data, &event); err != nil {
        log.Error("failed to unmarshal event", 
            zap.Error(err), 
            zap.String("worker_id", strconv.Itoa(workerID)))
        
        // Move to dead letter queue
        if err := moveToDLQ(msg, "inventory.dlq"); err != nil {
            log.Error("failed to move to DLQ", zap.Error(err))
        }
        msg.Nak()
        return
    }
    
    // Process the message...
    if err := reserveInventory(ctx, event.Data); err != nil {
        log.Error("inventory reservation failed",
            zap.Error(err),
            zap.String("order_id", event.Data.OrderID))
        
        msg.Nak() // Let NATS handle retry
        return
    }
    
    msg.Ack()
}

This approach gives us resilience - messages that can’t be processed are moved to a separate queue for later analysis without affecting the main processing flow.

Building production-ready systems requires attention to many details: proper configuration management, observability, error handling, and scalability. The patterns I’ve shown here provide a solid foundation that you can adapt to your specific needs.

I’d love to hear about your experiences with event-driven architectures. What challenges have you faced, and how did you solve them? Share your thoughts in the comments below, and if you found this useful, please like and share with others who might benefit from these patterns.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Our Creations

We are on Medium

Similar Posts

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Complete Guide

Build Event-Driven Microservices with NATS Streaming and Go: Complete Implementation Guide

Production-Ready Go Worker Pool: Master Graceful Shutdown, Backpressure, and Advanced Concurrency Patterns

How to Integrate Echo with Redis for Lightning-Fast Go Web Applications: Complete Guide

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

How to Integrate Echo with Redis for Lightning-Fast Go Web Applications