Build Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

golang

Build Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with Docker, Kubernetes & testing strategies.

Sep 16, 2025

Build Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

I’ve been thinking a lot about how modern applications handle scale and complexity. Traditional request-response architectures often struggle under heavy loads, creating bottlenecks that are hard to diagnose. That’s why I’ve been exploring event-driven microservices—they offer a more resilient and scalable approach to building distributed systems.

When you have multiple services needing to communicate, how do you ensure messages aren’t lost and systems stay in sync? This challenge led me to combine Go’s efficiency with NATS JetStream’s reliability and OpenTelemetry’s observability capabilities.

Let me walk you through building a production-ready system. We’ll create an e-commerce platform with three core services: order processing, inventory management, and customer notifications. These services will communicate through events rather than direct API calls.

First, we establish our event bus using NATS JetStream. This provides persistent message storage and exactly-once delivery semantics. Here’s how we set up our connection:

func NewNATSEventBus(url string) (*NATSEventBus, error) {
    opts := []nats.Option{
        nats.ReconnectWait(2 * time.Second),
        nats.MaxReconnects(-1),
    }
    
    conn, err := nats.Connect(url, opts...)
    if err != nil {
        return nil, fmt.Errorf("failed to connect to NATS: %w", err)
    }
    
    js, err := conn.JetStream()
    if err != nil {
        return nil, fmt.Errorf("failed to create JetStream context: %w", err)
    }
    
    return &NATSEventBus{conn: conn, js: js}, nil
}

Notice how we configure automatic reconnection? This ensures our services remain available even if the message broker temporarily goes down.

Now, consider what happens when a customer places an order. Instead of calling inventory and notification services directly, the order service publishes an event:

func (s *OrderService) CreateOrder(ctx context.Context, order models.Order) error {
    span := trace.SpanFromContext(ctx)
    defer span.End()
    
    event := models.OrderCreatedEvent{
        BaseEvent: models.BaseEvent{
            ID:        uuid.New().String(),
            Type:      models.OrderCreated,
            Timestamp: time.Now(),
        },
        Data: models.OrderCreatedData{
            OrderID:      order.ID,
            CustomerID:   order.CustomerID,
            CustomerEmail: order.CustomerEmail,
            Items:        order.Items,
        },
    }
    
    return s.eventBus.Publish(ctx, event)
}

But how do we ensure this event reaches all interested services reliably? That’s where JetStream’s stream configuration comes in:

func (eb *NATSEventBus) initStreams() error {
    _, err := eb.js.AddStream(&nats.StreamConfig{
        Name:     "ORDERS",
        Subjects: []string{"orders.*"},
        Retention: nats.WorkQueuePolicy,
        MaxAge:    24 * time.Hour,
    })
    return err
}

This configuration keeps messages for 24 hours and uses work queue policy for efficient processing.

Now let’s talk observability. Without proper tracing, debugging distributed systems becomes incredibly difficult. OpenTelemetry gives us visibility across service boundaries:

func StartOrderProcessing(ctx context.Context, eventData []byte) {
    ctx, span := tracer.Start(ctx, "process_order")
    defer span.End()
    
    span.SetAttributes(
        attribute.String("event.type", "order.created"),
        attribute.String("service.name", "inventory-service"),
    )
    
    var event models.OrderCreatedEvent
    if err := json.Unmarshal(eventData, &event); err != nil {
        span.RecordError(err)
        return
    }
    
    // Process inventory reservation
}

What happens if the inventory service is temporarily unavailable? We implement retry logic with exponential backoff:

func ReserveInventory(ctx context.Context, productID string, quantity int) error {
    const maxRetries = 5
    baseDelay := 100 * time.Millisecond
    
    for attempt := 0; attempt < maxRetries; attempt++ {
        err := tryReserve(ctx, productID, quantity)
        if err == nil {
            return nil
        }
        
        if !isRetryableError(err) {
            return err
        }
        
        delay := baseDelay * time.Duration(math.Pow(2, float64(attempt)))
        time.Sleep(delay)
    }
    return fmt.Errorf("max retries exceeded")
}

Testing these interactions requires simulating the entire event flow. We use Docker Compose to spin up our dependencies:

version: '3.8'
services:
  nats:
    image: nats:jetstream
    ports:
      - "4222:4222"
  
  jaeger:
    image: jaegertracing/all-in-one:1.35
    ports:
      - "16686:16686"

For production deployment, we package our services as minimal Docker containers:

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o order-service ./cmd/order-service

FROM alpine:latest
COPY --from=builder /app/order-service /app/
EXPOSE 8080
CMD ["/app/order-service"]

The beauty of this architecture lies in its resilience. If the notification service goes down, orders continue processing, and notifications will be sent once the service recovers. Messages persist in JetStream until they’re successfully processed.

Have you considered how circuit breakers prevent cascading failures? Here’s how we implement one:

type CircuitBreaker struct {
    failures     int
    maxFailures  int
    resetTimeout time.Duration
    lastFailure  time.Time
    mu           sync.Mutex
}

func (cb *CircuitBreaker) Execute(f func() error) error {
    cb.mu.Lock()
    
    if cb.failures >= cb.maxFailures {
        if time.Since(cb.lastFailure) < cb.resetTimeout {
            cb.mu.Unlock()
            return fmt.Errorf("circuit breaker open")
        }
        cb.failures = 0
    }
    
    cb.mu.Unlock()
    
    err := f()
    
    cb.mu.Lock()
    defer cb.mu.Unlock()
    
    if err != nil {
        cb.failures++
        cb.lastFailure = time.Now()
    } else {
        cb.failures = 0
    }
    
    return err
}

Monitoring becomes straightforward with the metrics collected by OpenTelemetry. We can track message processing times, error rates, and system health through Prometheus and Grafana.

The combination of Go’s performance, NATS JetStream’s reliability, and OpenTelemetry’s observability creates a robust foundation for event-driven systems. Each service remains focused on its domain while communicating through well-defined events.

This approach scales beautifully. Need to add a new service that reacts to order events? Simply subscribe to the relevant subjects—no changes required to existing services.

I’d love to hear your thoughts on this architecture. What challenges have you faced with microservices? Share your experiences in the comments below, and don’t forget to like and share if you found this valuable!

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Build Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

Our Creations

We are on Medium

Similar Posts

Building Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

Master Go Worker Pools: Build Production-Ready Systems with Graceful Shutdown and Error Handling

How to Build a Resilient Distributed Cache in Go Using Groupcache

Echo Redis Integration: Build Lightning-Fast Web Applications with High-Performance Caching and Real-Time Features

How to Integrate Fiber with Swagger for Automatic API Documentation in Go Applications

Building Lightning-Fast Web Apps: Complete Guide to Fiber and Redis Integration for Go Developers