golang

Building Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Production Guide

Learn to build production-ready event-driven microservices using Go, NATS JetStream & OpenTelemetry. Complete tutorial with architecture patterns, observability & deployment.

Building Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Production Guide

I’ve been thinking a lot about how modern applications need to handle massive scale while remaining observable and resilient. The combination of Go’s efficiency, NATS JetStream’s reliability, and OpenTelemetry’s observability creates a powerful foundation for building production-ready systems. Let me show you how to put these pieces together.

Why do we keep hearing about event-driven architectures? They enable systems to be more responsive, scalable, and loosely coupled. But building them properly requires careful consideration of message delivery, error handling, and observability.

Let’s start with our NATS JetStream setup. We need reliable message streaming with persistence guarantees. Here’s how I configure the core streams:

streams := []StreamConfig{
    {
        Name:      "ORDERS",
        Subjects:  []string{"orders.*"},
        Retention: jetstream.WorkQueuePolicy,
        MaxAge:    24 * time.Hour,
        Replicas:  3,
    },
    {
        Name:      "PAYMENTS",
        Subjects:  []string{"payments.*"},
        Retention: jetstream.LimitsPolicy,
        MaxAge:    7 * 24 * time.Hour,
        Replicas:  3,
    }
}

Have you ever wondered how to ensure messages aren’t lost during processing? JetStream’s acknowledgment mechanism provides exactly-once delivery semantics. When a service processes a message, it must explicitly acknowledge receipt. If the service crashes mid-processing, the message will be redelivered.

Now let’s look at implementing our order service. Notice how we use context propagation for distributed tracing:

func (s *OrderService) CreateOrder(ctx context.Context, order events.Order) error {
    ctx, span := s.tracer.Start(ctx, "order.create")
    defer span.End()

    // Generate event with tracing context
    event := events.NewEvent(events.OrderCreated, order.ID, map[string]interface{}{
        "customer_id":  order.CustomerID,
        "total_amount": order.TotalAmount,
        "items":        order.Items,
    })
    
    // Add tracing metadata
    event.Metadata["trace_id"] = span.SpanContext().TraceID().String()
    event.Metadata["span_id"] = span.SpanContext().SpanID().String()

    return s.jsClient.PublishEvent(ctx, event)
}

What happens when external services like payment processors become unavailable? We implement circuit breakers to prevent cascading failures. The gobreaker package provides an excellent implementation:

func (s *PaymentService) ProcessPayment(ctx context.Context, payment events.Payment) error {
    result, err := s.breaker.Execute(func() (interface{}, error) {
        // Simulate external payment API call
        resp, err := s.httpClient.Post("/process", payment)
        if err != nil {
            return nil, fmt.Errorf("payment API unavailable: %w", err)
        }
        return resp, nil
    })
    
    if err != nil {
        s.metrics.PaymentFailures.Inc()
        return fmt.Errorf("payment processing failed: %w", err)
    }
    
    s.metrics.PaymentsProcessed.Inc()
    return nil
}

Observability is crucial in distributed systems. OpenTelemetry gives us unified tracing, metrics, and logging. Here’s how I set up the telemetry pipeline:

func SetupTelemetry(serviceName string) (*trace.TracerProvider, *metric.MeterProvider, error) {
    // Jaeger exporter for tracing
    exp, err := jaeger.New(jaeger.WithCollectorEndpoint())
    if err != nil {
        return nil, nil, err
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exp),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        )),
    )

    // Prometheus exporter for metrics
    metricExp, err := prometheus.New()
    if err != nil {
        return nil, nil, err
    }

    mp := metric.NewMeterProvider(
        metric.WithReader(metricExp),
        metric.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        )),
    )

    return tp, mp, nil
}

How do we ensure graceful shutdowns? Services must complete in-flight requests before terminating. Here’s my shutdown handler pattern:

func (s *OrderService) Start() error {
    // Setup signal handling
    stop := make(chan os.Signal, 1)
    signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)

    go s.processOrders()

    <-stop
    s.logger.Info("shutting down gracefully")
    
    // Give services time to finish current work
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    return s.shutdown(ctx)
}

Error handling deserves special attention. We need to distinguish between transient and permanent failures:

func (s *InventoryService) handleEvent(ctx context.Context, msg jetstream.Msg) {
    var event events.Event
    if err := json.Unmarshal(msg.Data(), &event); err != {
        s.logger.Error("failed to unmarshal event", zap.Error(err))
        msg.Nak() // Negative acknowledgment for retry
        return
    }

    if err := s.processInventory(event); err != nil {
        if errors.Is(err, ErrPermanentFailure) {
            msg.Term() // Terminate message - no retry
        } else {
            msg.NakWithDelay(10 * time.Second) // Retry after delay
        }
        return
    }

    msg.Ack() // Acknowledge successful processing
}

Deployment becomes straightforward with Docker Compose. Our setup includes all services with proper networking and resource limits:

version: '3.8'
services:
  nats:
    image: nats:jetstream
    ports:
      - "4222:4222"
    command: "-js -m 8222 -sd /data"
  
  order-service:
    build: ./services/order
    environment:
      - NATS_URL=nats://nats:4222
      - JAEGER_URL=http://jaeger:14268/api/traces
    depends_on:
      - nats
      - jaeger

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"

Monitoring with Prometheus gives us real-time insights into system health. We track key metrics like message processing latency, error rates, and circuit breaker states:

func NewMetrics() *Metrics {
    return &Metrics{
        EventsProcessed: promauto.NewCounterVec(prometheus.CounterOpts{
            Name: "events_processed_total",
            Help: "Total number of events processed",
        }, []string{"service", "event_type"}),
        
        ProcessingLatency: promauto.NewHistogramVec(prometheus.HistogramOpts{
            Name:    "event_processing_latency_seconds",
            Help:    "Event processing latency distribution",
            Buckets: prometheus.DefBuckets,
        }, []string{"service", "event_type"}),
    }
}

Building production-ready systems requires attention to both the happy path and edge cases. Proper error handling, observability, and resilience patterns separate hobby projects from production systems.

What challenges have you faced when building distributed systems? I’d love to hear about your experiences and solutions. If you found this helpful, please share it with others who might benefit from these patterns. Your comments and feedback are always welcome!

Keywords: event-driven microservices, Go microservices architecture, NATS JetStream tutorial, OpenTelemetry distributed tracing, production microservices Go, message streaming Go, microservices observability, Go event sourcing, distributed systems Go, microservices monitoring Prometheus



Similar Posts
Blog Image
Cobra Viper Integration: Build Powerful Go CLI Apps with Advanced Configuration Management

Learn how to integrate Cobra with Viper for powerful CLI configuration management in Go. Build flexible cloud-native apps with seamless config handling.

Blog Image
Cobra + Viper Integration: Build Professional Go CLI Tools with Advanced Configuration Management

Learn to integrate Cobra with Viper for powerful Go CLI apps. Build flexible command-line tools with advanced configuration management, file support, and hierarchical settings.

Blog Image
Production-Ready gRPC Microservices with Go: Server Streaming, JWT Authentication, and Kubernetes Deployment Guide

Master production-ready gRPC microservices with Go: server streaming, JWT auth, Kubernetes deployment, and comprehensive testing strategies.

Blog Image
Fiber Redis Integration: Build Lightning-Fast Go Web Applications with High-Performance Caching

Learn how to integrate Fiber with Redis to build lightning-fast Go web applications that handle massive loads with sub-millisecond response times and seamless scalability.

Blog Image
Complete Guide to Chi Router OpenTelemetry Integration for Go Distributed Tracing and Microservices Monitoring

Learn to integrate Chi Router with OpenTelemetry for distributed tracing in Go microservices. Improve debugging and performance monitoring effortlessly.

Blog Image
Complete Guide: Building Production-Ready Event-Driven Microservices with NATS, Go, and Distributed Tracing

Learn to build production-ready microservices with NATS messaging, Go concurrency patterns, and OpenTelemetry tracing. Master event-driven architecture today!