Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with code examples, testing & deployment.

Oct 29, 2025

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been building microservices for years, and the shift to event-driven architectures has transformed how we handle scale and complexity. Recently, I found myself struggling with message reliability and observability in a production system. That frustration sparked this deep exploration into combining Go’s efficiency with NATS JetStream’s persistence and OpenTelemetry’s tracing capabilities. If you’re tired of losing messages or debugging distributed systems blindfolded, you’re in the right place.

Let me show you how to build systems that not only handle massive loads but also tell you exactly what’s happening inside. The key lies in treating events as first-class citizens while ensuring every component remains observable and resilient.

When designing event-driven systems, I always start with clear boundaries between services. Each microservice should own its data and communicate through well-defined events. Have you considered what happens when an order service needs to check inventory without creating tight coupling? Events solve this beautifully by allowing services to react rather than request.

Here’s a basic event structure I use across services:

type OrderCreatedEvent struct {
    EventID     string    `json:"event_id"`
    AggregateID string    `json:"aggregate_id"`
    Version     int       `json:"version"`
    EventType   string    `json:"event_type"`
    Timestamp   time.Time `json:"timestamp"`
    Data        OrderData `json:"data"`
}

func PublishOrderCreated(ctx context.Context, js nats.JetStreamContext, order Order) error {
    event := OrderCreatedEvent{
        EventID:     uuid.New().String(),
        AggregateID: order.ID,
        Version:     1,
        EventType:   "ORDER_CREATED",
        Timestamp:   time.Now().UTC(),
        Data:        order.ToData(),
    }
    
    data, _ := json.Marshal(event)
    _, err := js.Publish("orders.created", data)
    return err
}

Setting up NATS JetStream requires careful configuration. I learned the hard way that default settings don’t cut it in production. You need durable streams with proper retention policies. How long should you keep payment events versus notification events? That depends on your compliance and business needs.

streamConfig := &nats.StreamConfig{
    Name:      "ORDERS",
    Subjects:  []string{"orders.*"},
    Retention: nats.WorkQueuePolicy,
    Storage:   nats.FileStorage,
    MaxAge:    24 * time.Hour,
    Replicas:  3,
}

js.AddStream(streamConfig)

Observability isn’t just logging—it’s about understanding flow across services. I integrate OpenTelemetry from day one because retrofitting tracing is painful. Every event publication and consumption should carry trace context.

func (s *OrderService) CreateOrder(ctx context.Context, order Order) error {
    ctx, span := s.tracer.Start(ctx, "order.create")
    defer span.End()

    // Add trace context to event
    event := OrderCreatedEvent{
        TraceID:    span.SpanContext().TraceID().String(),
        SpanID:     span.SpanContext().SpanID().String(),
        // ... other fields
    }
    
    carrier := propagation.MapCarrier{}
    otel.GetTextMapPropagator().Inject(ctx, carrier)
    event.Metadata = carrier

    return s.js.Publish("orders.created", event)
}

Concurrency in Go makes event processing incredibly efficient. But have you ever faced goroutine leaks under high load? I use worker pools with proper context cancellation to prevent resource exhaustion.

func (p *EventProcessor) StartWorkers(ctx context.Context, subject string) {
    for i := 0; i < p.workerCount; i++ {
        go p.worker(ctx, subject)
    }
}

func (p *EventProcessor) worker(ctx context.Context, subject string) {
    sub, _ := p.js.PullSubscribe(subject, "order-workers")
    
    for {
        select {
        case <-ctx.Done():
            return
        default:
            msgs, _ := sub.Fetch(10, nats.MaxWait(5*time.Second))
            for _, msg := range msgs {
                p.processMessage(ctx, msg)
            }
        }
    }
}

Testing event-driven systems requires simulating real-world conditions. I run integration tests with actual NATS instances in Docker to catch timing issues and race conditions. What’s the point of unit tests if they don’t reflect production behavior?

Deployment involves more than just running containers. I configure health checks that verify NATS connectivity and OpenTelemetry exports. Without proper readiness probes, your services might start before dependencies are available.

Performance tuning comes down to monitoring key metrics. I track event processing latency, error rates, and consumer lag. When you see processing time spike, is it your code or the network? Distributed tracing answers that instantly.

Common pitfalls include ignoring idempotency and forgetting backpressure. Services must handle duplicate messages gracefully, and systems need to slow down when overwhelmed. Circuit breakers prevent cascading failures when downstream services struggle.

Building production-ready systems means anticipating failure at every step. I’ve seen projects crumble under load because they treated events as fire-and-forget. With JetStream’s persistence and OpenTelemetry’s visibility, you can sleep well knowing your system is both robust and transparent.

If this approach resonates with you, I’d love to hear about your experiences. What challenges have you faced with event-driven architectures? Share your thoughts in the comments, and if this helped clarify things, don’t forget to like and share this with your team. Let’s build more reliable systems together.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Our Creations

We are on Medium

Similar Posts

How to Build a Resilient Distributed Cache with Consistent Hashing in Go

Build Production-Ready Event-Driven Microservices with Go, NATS, and MongoDB: Complete Tutorial

Building a Real-Time Stream Processor in Go with Kafka and PostgreSQL

How to Integrate Fiber with MongoDB Driver for High-Performance Go Applications and REST APIs

Production-Ready Go Worker Pool: Master Graceful Shutdown, Backpressure, and Advanced Concurrency Patterns

Building Resilient Go Services with Circuit Breakers and Intelligent Retries