Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

golang

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

Master production-ready event-driven microservices with Go, NATS JetStream & OpenTelemetry. Build scalable, observable systems with comprehensive examples.

Aug 10, 2025

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

Ever found yourself wrestling with microservices that don’t talk to each other reliably? I recently faced this challenge while scaling an e-commerce platform, leading me to explore event-driven architectures with Go, NATS JetStream, and OpenTelemetry. The results were transformative - let me share how you can build production-ready systems that handle failures gracefully while maintaining observability.

Setting up our project structure first creates clarity. We organize services in cmd/ while keeping shared logic in internal/ and reusable components in pkg/. Initializing modules with go mod init establishes our foundation. Notice how we include critical dependencies like NATS for messaging and OpenTelemetry for tracing:

go get github.com/nats-io/nats.go
go get go.opentelemetry.io/otel
go get github.com/sony/gobreaker

Defining event models prevents chaos later. We structure events like OrderCreated with metadata for tracing context. Why is this crucial? Because when a payment fails at 3 AM, you’ll need that trace ID to diagnose issues. Here’s our event structure:

type Event struct {
    ID          string                 `json:"id"`
    Type        EventType              `json:"type"`
    AggregateID string                 `json:"aggregate_id"`
    Metadata    Metadata               `json:"metadata"` // Contains TraceID
}

type Metadata struct {
    TraceID      string            `json:"trace_id"`
    SpanID       string            `json:"span_id"`
    CorrelationID string           `json:"correlation_id"`
}

Configuration management often gets overlooked until deployment fails. We use Viper for multi-source configuration with sensible defaults. How might environment variables override your YAML configs? This approach handles it gracefully:

func LoadConfig(path string) (*Config, error) {
    viper.AddConfigPath(path)
    viper.AutomaticEnv() // Reads from environment variables
    setDefaults()
    
    viper.SetDefault("nats.url", "nats://localhost:4222")
    viper.SetDefault("server.port", "8080")
    
    var config Config
    err := viper.Unmarshal(&config)
    return &config, err
}

Connecting to NATS JetStream requires resilience. We implement reconnection logic because network blips shouldn’t crash services. Notice the MaxReconnects and ReconnectWait - how long would your system wait before giving up?

nc, err := nats.Connect(config.NATS.URL,
    nats.MaxReconnects(config.NATS.MaxReconnects),
    nats.ReconnectWait(time.Second*time.Duration(config.NATS.ReconnectWait)),
)

For tracing, we initialize OpenTelemetry with Jaeger exporter. This code snippet creates spans that follow events across services. Ever wondered how payment failures correlate with inventory checks? Distributed traces show you:

func InitTracer(jaegerURL, serviceName string) func(context.Context) error {
    exporter, _ := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint(jaegerURL),
    ))
    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String(serviceName),
        )),
    )
    otel.SetTracerProvider(tp)
    return tp.Shutdown
}

Implementing circuit breakers prevents cascading failures. When the payment service times out, the gobreaker trips to stop overwhelming it. What happens to requests during this cooldown? They fail fast instead of backing up:

cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:        "PaymentService",
    Timeout:     15 * time.Second,
    ReadyToTrip: func(counts gobreaker.Counts) bool {
        return counts.ConsecutiveFailures > 5
    },
})

_, err := cb.Execute(func() (interface{}, error) {
    return processPayment(order) // Protected call
})

For dead letter handling, we configure JetStream consumers to redirect poison messages. This snippet moves problematic events to a ORDERS.DLQ stream after 3 delivery attempts:

js.AddConsumer("ORDERS", &nats.ConsumerConfig{
    Durable: "OrderProcessor",
    AckPolicy: nats.AckExplicitPolicy,
    MaxDeliver: 3,
    BackOff: []time.Duration{1 * time.Second, 3 * time.Second},
    DeliverSubject: "ORDERS.DLQ", // Dead letter queue
})

Testing event-driven systems requires simulating failures. We use NATS’ testing tools to mock network partitions. How would your service handle sudden disconnects mid-payment? This test reveals weaknesses:

func TestPaymentTimeout(t *testing.T) {
    srv := test.StartServer(&test.DefaultTestOptions)
    defer srv.Shutdown()
    
    // Force disconnect after message sent
    srv.SetDisconnectHandler(func(c *nats.Conn) {
        t.Log("Simulating network partition")
    })
    
    _, err := processPayment(testOrder)
    assert.ErrorContains(t, err, "payment timeout")
}

Deploying with Docker Compose ensures environment consistency. Our docker-compose.yml bundles services with Jaeger and Prometheus. Notice the health checks - they prevent traffic from reaching unready containers:

services:
  order-service:
    image: order-service:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
  jaeger:
    image: jaegertracing/all-in-one:1.35
    ports:
      - "16686:16686"

The real magic happens when everything integrates. Events flow through NATS while traces connect across services. We get Prometheus metrics showing event processing times and failure rates. Suddenly, that 2 AM production incident becomes solvable in minutes.

I’m curious - what challenges have you faced with microservices? Share your stories below! If this approach resonates with you, give it a like or share with teammates wrestling with distributed systems. Comments and questions are always welcome - let’s build more resilient systems together.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

Our Creations

We are on Medium

Similar Posts

Building Production-Ready gRPC Microservices with Go: Architecture, Testing, and Observability Complete Guide

Build a Production-Ready Worker Pool in Go with Graceful Shutdown and Advanced Concurrency Patterns

Build High-Performance Go Web Apps: Integrating Echo Framework with go-redis for Scalable Caching and Data Management

Building High-Performance APIs with Go, Fiber, and Bun ORM

Building Production-Ready gRPC Services with Go: Advanced Patterns, Streaming, and Observability Complete Guide

How to Integrate Echo Framework with OpenTelemetry for High-Performance Go Microservices Observability