Building Production-Ready Event-Driven Microservices with Go, NATS, and OpenTelemetry: Complete Guide

golang

Building Production-Ready Event-Driven Microservices with Go, NATS, and OpenTelemetry: Complete Guide

Learn to build production-ready event-driven microservices with Go, NATS JetStream & OpenTelemetry. Master distributed tracing, resilience patterns & scalable architecture.

Aug 11, 2025

Building Production-Ready Event-Driven Microservices with Go, NATS, and OpenTelemetry: Complete Guide

I’ve been thinking a lot about building resilient systems lately. Every time I see a service go down during peak traffic or lose critical data during failures, I’m reminded how crucial proper architecture is. That’s why I want to share my approach to creating production-grade event-driven microservices using Go, NATS, and OpenTelemetry. These tools have become my go-to stack for building systems that can handle real-world pressure. Let’s explore how they work together.

Our architecture centers around NATS JetStream for reliable messaging. We’ll create an order processing flow where services communicate through events rather than direct calls. This separation keeps our components independent and resilient. When an order gets created, it publishes an event that both inventory and notification services react to. Each service focuses on its specific task without knowing about others. How do we ensure these events aren’t lost though? That’s where JetStream’s persistence comes in.

Setting up the project requires careful structure. I organize my Go workspace with clear separation between services and shared packages. Here’s how I typically initialize:

mkdir -p cmd/{order,inventory,notification}
mkdir internal/{events,telemetry,handlers}

Dependencies matter. My go.mod includes critical libraries:

require (
    github.com/nats-io/nats.go v1.31.0
    go.opentelemetry.io/otel v1.21.0
    go.opentelemetry.io/otel/exporters/jaeger v1.17.0
    github.com/google/uuid v1.4.0
)

Event schemas form the contract between services. I define them strictly with versioning:

type OrderCreated struct {
    ID          string    `json:"id"`
    OrderID     string    `json:"order_id"`
    Items       []Item    `json:"items"`
    Timestamp   time.Time `json:"timestamp"`
}

func NewOrderCreated(orderID string, items []Item) *OrderCreated {
    return &OrderCreated{
        ID:        uuid.NewString(),
        OrderID:   orderID,
        Items:     items,
        Timestamp: time.Now().UTC(),
    }
}

Connecting to NATS requires robust configuration. Notice how I handle reconnection logic:

func Connect(url string) (nats.JetStreamContext, error) {
    nc, _ := nats.Connect(url,
        nats.MaxReconnects(5),
        nats.ReconnectWait(2*time.Second),
    )
    return nc.JetStream(nats.PublishAsyncMaxPending(256))
}

For event processing, I use worker pools instead of individual goroutines. This controls resource usage:

func StartWorkers(ctx context.Context, js nats.JetStreamContext, topic string) {
    wg := sync.WaitGroup{}
    for i := 0; i < 5; i++ { // 5 workers
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            sub, _ := js.PullSubscribe(topic, "inventory-group")
            for {
                select {
                case <-ctx.Done():
                    return
                default:
                    msgs, _ := sub.Fetch(1, nats.MaxWait(5*time.Second))
                    for _, msg := range msgs {
                        process(msg)
                        msg.Ack()
                    }
                }
            }
        }(i)
    }
    wg.Wait()
}

What happens when things fail? We need visibility. OpenTelemetry provides that with distributed tracing. Integrating it into our services gives us request visibility across service boundaries. Here’s how I initialize tracing:

func InitTracing(serviceName string) func(context.Context) error {
    exporter, _ := jaeger.New(jaeger.WithCollectorEndpoint())
    provider := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            attribute.String("service.name", serviceName),
        )),
    )
    otel.SetTracerProvider(provider)
    return provider.Shutdown
}

In HTTP handlers, I propagate traces automatically using middleware:

r := gin.Default()
r.Use(otelgin.Middleware("order-service"))

Resilience requires more than just retries. I implement circuit breakers for downstream calls:

cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:        "InventoryService",
    MaxRequests: 5,
    Interval:    30 * time.Second,
    Timeout:     10 * time.Second,
})

_, err := cb.Execute(func() (interface{}, error) {
    return reserveInventory(orderID)
})

For deployment, I package services in Docker containers with health checks:

HEALTHCHECK --interval=30s --timeout=5s \
    CMD curl -f http://localhost:8080/health || exit 1

Performance tuning becomes critical at scale. I always benchmark my message processors:

func BenchmarkOrderProcessing(b *testing.B) {
    msg := createTestMessage()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        processOrder(msg)
    }
}

What separates production-ready services from prototypes? It’s the attention to failure scenarios. I simulate network partitions during testing to verify our service behavior. Can our system continue operating when dependencies become unavailable? That’s the real test.

Building these systems requires thoughtful design, but the payoff comes in reliability and scalability. I’ve seen these patterns handle thousands of events per second while providing critical visibility during outages. If you found this approach valuable, share your thoughts below. What patterns have worked well in your systems? Let me know in the comments, and share this with others who might benefit.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building Production-Ready Event-Driven Microservices with Go, NATS, and OpenTelemetry: Complete Guide

Our Creations

We are on Medium

Similar Posts

Building Production-Ready Event-Driven Microservices: Go, NATS JetStream & OpenTelemetry Guide

How to Simplify Go Dependency Management with Uber Fx and Zap Logging

Building Resilient Go Services with Resty and Hystrix-Go Circuit Breakers

Echo Redis Integration Guide: Build Lightning-Fast Scalable Go Web Applications with In-Memory Caching

Boost Web App Performance: Complete Guide to Integrating Fiber with Redis for Lightning-Fast Applications

Cobra and Viper Integration: Build Enterprise-Grade Go CLI Apps with Advanced Configuration Management