Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build production-ready event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with code examples, testing & deployment.

Oct 18, 2025

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been thinking about how modern systems handle massive transaction volumes while remaining reliable. Recently, I worked on a project where traditional request-response patterns created bottlenecks during peak loads. This experience led me to explore event-driven architectures, specifically using Go’s efficiency combined with NATS JetStream’s persistence and OpenTelemetry’s observability capabilities.

Have you ever wondered how systems process thousands of orders without losing track of any? The answer often lies in event-driven patterns where services communicate through messages rather than direct API calls. Let me show you how to build such systems for production environments.

First, let’s establish our messaging foundation with NATS JetStream. Unlike traditional message queues, JetStream provides durable storage and exactly-once delivery semantics. Here’s how you can configure a robust connection:

func NewJetStreamConnection() (nats.JetStreamContext, error) {
    nc, err := nats.Connect("nats://localhost:4222",
        nats.MaxReconnects(-1),
        nats.ReconnectWait(2*time.Second),
        nats.DisconnectErrHandler(func(_ *nats.Conn, err error) {
            log.Printf("Disconnected: %v", err)
        }),
    )
    if err != nil {
        return nil, fmt.Errorf("connection failed: %w", err)
    }
    
    return nc.JetStream(nats.PublishAsyncMaxPending(256))
}

This configuration ensures your services remain connected even during network interruptions. The async publishing with maximum pending messages prevents memory exhaustion during high loads.

Now, what happens when a message fails to process? JetStream’s acknowledgment system provides the answer. Messages remain in the stream until explicitly acknowledged, giving your services time to process them reliably:

func ProcessOrders(ctx context.Context, js nats.JetStreamContext) error {
    sub, err := js.PullSubscribe("orders.created", "order-processor")
    if err != nil {
        return fmt.Errorf("subscription failed: %w", err)
    }
    
    for {
        select {
        case <-ctx.Done():
            return ctx.Err()
        default:
            msgs, err := sub.Fetch(10, nats.MaxWait(5*time.Second))
            if err != nil {
                continue
            }
            
            for _, msg := range msgs {
                if err := handleOrderMessage(msg); err != nil {
                    msg.Nak() // Negative acknowledgment - redeliver later
                } else {
                    msg.Ack() // Processing successful
                }
            }
        }
    }
}

But how do you track requests across multiple services? This is where OpenTelemetry becomes invaluable. By adding distributed tracing, you can follow a single order through your entire system:

func ProcessPayment(ctx context.Context, order *Order) error {
    ctx, span := otel.Tracer("payment-service").
        Start(ctx, "ProcessPayment")
    defer span.End()
    
    span.SetAttributes(
        attribute.String("order.id", order.ID),
        attribute.Float64("order.amount", order.Amount),
    )
    
    // Payment processing logic
    if err := chargeCreditCard(ctx, order); err != nil {
        span.RecordError(err)
        return fmt.Errorf("payment failed: %w", err)
    }
    
    return nil
}

When building production systems, schema evolution becomes crucial. Protocol Buffers help maintain compatibility as your events change over time:

syntax = "proto3";

message OrderCreated {
    string order_id = 1;
    string customer_id = 2;
    repeated OrderItem items = 3;
    double total_amount = 4;
    string currency = 5;
    
    // New fields can be added while maintaining backward compatibility
    string promotion_code = 6; // Added in v2
}

Testing event-driven systems requires simulating real-world conditions. I’ve found that container-based testing provides the most accurate results:

func TestOrderProcessing(t *testing.T) {
    ctx := context.Background()
    
    natsContainer, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        ContainerRequest: testcontainers.ContainerRequest{
            Image: "nats:jetstream",
            ExposedPorts: []string{"4222/tcp"},
        },
        Started: true,
    })
    if err != nil {
        t.Fatalf("Failed to start NATS: %v", err)
    }
    defer natsContainer.Terminate(ctx)
    
    // Your test logic here
    // This ensures your tests run against a real JetStream instance
}

One question I often get: how do you handle database transactions alongside message publishing? The outbox pattern provides a reliable solution. Instead of publishing directly after database commits, you write events to an outbox table within the same transaction, then a separate process publishes these events.

Error handling in distributed systems requires careful consideration. Circuit breakers prevent cascading failures when downstream services become unavailable:

func NewInventoryClient() *InventoryClient {
    cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
        Name:        "inventory-service",
        MaxRequests: 5,
        Interval:    30 * time.Second,
        Timeout:     60 * time.Second,
        ReadyToTrip: func(counts gobreaker.Counts) bool {
            return counts.ConsecutiveFailures > 5
        },
    })
    
    return &InventoryClient{circuitBreaker: cb}
}

As your system grows, monitoring becomes essential. I recommend establishing Service Level Objectives (SLOs) for key operations and setting up alerts when these thresholds are at risk. Combine metrics from your services with JetStream’s built-in metrics for comprehensive visibility.

Building event-driven microservices requires shifting from synchronous to asynchronous thinking. Instead of asking “what response will I get,” you ask “what events will occur as a result of this action.” This mental model change is challenging but ultimately leads to more resilient systems.

The combination of Go’s performance, NATS JetStream’s reliability, and OpenTelemetry’s observability creates a foundation that can handle real production workloads. Start with simple event flows, establish your observability early, and gradually add complexity as your team becomes comfortable with the patterns.

I’d love to hear about your experiences with event-driven architectures. What challenges have you faced when moving from synchronous to asynchronous patterns? Share your thoughts in the comments below, and if you found this helpful, please like and share with others who might benefit from these approaches.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Our Creations

We are on Medium

Similar Posts

Building Production-Ready Event-Driven Microservices with Go NATS JetStream and OpenTelemetry Complete Guide

Production-Ready Worker Pool System with Graceful Shutdown in Go: Complete Implementation Guide

Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Complete Guide

Cobra + Viper Integration: Build Enterprise-Grade CLI Tools with Advanced Configuration Management

Building Production-Ready Microservices with gRPC, PostgreSQL, and Distributed Tracing in Go

Production-Ready Go Worker Pool: Master Graceful Shutdown, Backpressure, and Advanced Concurrency Patterns