Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

golang

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

Build production-ready event-driven microservices with Go, NATS JetStream, and OpenTelemetry. Learn distributed tracing, resilience patterns, and deployment best practices.

Sep 22, 2025

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

I’ve been thinking about event-driven architectures a lot lately, especially how they handle real-world complexity at scale. The combination of Go’s simplicity, NATS JetStream’s reliability, and OpenTelemetry’s observability creates a powerful foundation for building systems that can handle millions of events while remaining maintainable and debuggable.

Let me show you how to build something production-ready.

Setting up our project begins with a clean structure. I prefer organizing by domain rather than technology, keeping related code together. Our core models define the language our services will speak – orders, inventory, and the events that connect them.

Why do we need events anyway? Because they create loose coupling between services, allowing each component to evolve independently while maintaining system integrity.

Here’s our order model foundation:

type Order struct {
    ID          uuid.UUID
    CustomerID  string
    Items       []OrderItem
    Status      OrderStatus
    TotalAmount float64
    CreatedAt   time.Time
}

type OrderCreatedEvent struct {
    BaseEvent
    Order Order
}

NATS JetStream forms our messaging backbone. Unlike traditional message brokers, it offers persistence and exactly-once delivery semantics out of the box. The configuration balances memory and disk storage for different performance needs.

Connecting to NATS requires careful error handling – network partitions happen, and our system must gracefully handle them:

func NewNATSClient(config NATSConfig) (*NATSClient, error) {
    opts := []nats.Option{
        nats.MaxReconnects(10),
        nats.ReconnectWait(2 * time.Second),
        nats.DisconnectErrHandler(func(nc *nats.Conn, err error) {
            log.Warn("NATS disconnected", "error", err)
        }),
    }
    
    conn, err := nats.Connect(config.URL, opts...)
    if err != nil {
        return nil, fmt.Errorf("connection failed: %w", err)
    }
    
    js, err := conn.JetStream()
    return &NATSClient{conn: conn, js: js}, nil
}

How do we ensure messages aren’t lost during network issues? JetStream’s persistent streams and consumer acknowledgments handle this automatically.

The order service demonstrates event publishing. Notice how we include tracing context in every event – this becomes crucial when debugging distributed workflows:

func (s *OrderService) CreateOrder(ctx context.Context, order models.Order) error {
    span := trace.SpanFromContext(ctx)
    
    event := models.OrderCreatedEvent{
        BaseEvent: models.BaseEvent{
            ID:        uuid.New(),
            Type:      models.EventOrderCreated,
            Source:    "order-service",
            Timestamp: time.Now(),
            TraceID:   span.SpanContext().TraceID().String(),
        },
        Order: order,
    }
    
    return s.nats.PublishOrderCreated(ctx, event)
}

OpenTelemetry transforms our ability to understand system behavior. Distributed tracing shows us the complete journey of a request across service boundaries:

func InitTracing(serviceName string) (*tracesdk.TracerProvider, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := tracesdk.NewTracerProvider(
        tracesdk.WithBatcher(exporter),
        tracesdk.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String(serviceName),
        )),
    )
    
    otel.SetTracerProvider(tp)
    return tp, nil
}

What happens when services need to scale independently? Our inventory service shows how to handle concurrent event processing without losing messages:

func (s *InventoryService) StartConsumer(ctx context.Context) {
    sub, err := s.js.QueueSubscribe(
        "ORDERS.created",
        "inventory-group",
        s.handleOrderCreated,
        nats.DeliverAll(),
        nats.AckExplicit(),
    )
    
    go func() {
        <-ctx.Done()
        sub.Unsubscribe()
    }()
}

func (s *InventoryService) handleOrderCreated(msg *nats.Msg) {
    ctx := context.Background()
    
    var event models.OrderCreatedEvent
    if err := json.Unmarshal(msg.Data, &event); err != nil {
        log.Error("Failed to unmarshal event", "error", err)
        msg.Nak()
        return
    }
    
    if err := s.processInventoryReservation(ctx, event); err != nil {
        msg.Nak()
        return
    }
    
    msg.Ack()
}

Error handling requires thoughtful patterns. Circuit breakers prevent cascading failures when downstream services become unavailable:

func NewInventoryClient() *inventoryClient {
    cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
        Name:        "inventory-service",
        Timeout:     30 * time.Second,
        ReadyToTrip: func(counts gobreaker.Counts) bool {
            return counts.ConsecutiveFailures > 5
        },
    })
    
    return &inventoryClient{circuitBreaker: cb}
}

Deployment considerations include health checks and graceful shutdown. Kubernetes needs to know when our service is truly ready to handle traffic:

func (s *OrderService) Start() error {
    server := gin.New()
    server.GET("/health", s.healthHandler)
    
    go func() {
        if err := server.Run(":8080"); err != nil {
            log.Fatal("HTTP server failed", "error", err)
        }
    }()
    
    return s.startEventConsumers()
}

func (s *OrderService) Stop(ctx context.Context) error {
    if err := s.nats.Close(); err != nil {
        log.Error("Failed to close NATS connection", "error", err)
    }
    return nil
}

Monitoring becomes straightforward with structured logging and metrics. We can track everything from message processing times to error rates:

func (s *NotificationService) handleEvent(msg *nats.Msg) {
    start := time.Now()
    
    defer func() {
        metrics.ProcessingTime.Observe(time.Since(start).Seconds())
        metrics.ProcessedEvents.Inc()
    }()
    
    // Message processing logic
}

Building this type of system requires attention to many details, but the payoff is enormous. You get scalability, resilience, and observability in one cohesive architecture.

What challenges have you faced with microservices? I’d love to hear your experiences – share your thoughts in the comments below, and if this approach resonates with you, please like and share this article with your team.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

Our Creations

We are on Medium

Similar Posts

How to Combine Asynq and MongoDB for Reliable Background Jobs in Go

Integrating Cobra with Viper: Build Advanced CLI Tools with Flexible Configuration Management in Go

Build Production-Ready Event-Driven Microservices with NATS, Go, and Kubernetes: Complete Tutorial

Building Fast and Reliable Message-Driven Systems in Go with NATS and sqlx

How to Combine Zerolog and Lumberjack for Scalable, Structured Logging in Go

Build Production-Ready Event-Driven Microservices with NATS, GORM, and Go Structured Logging