golang

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build production-ready event-driven microservices using Go, NATS JetStream & OpenTelemetry. Complete guide with architecture patterns, observability & deployment.

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been thinking a lot lately about building microservices that can handle real production workloads. You know, the kind that don’t just work in development but actually survive the chaos of real-world deployment. That’s why I want to share my approach to creating event-driven systems using Go, NATS JetStream, and OpenTelemetry.

Have you ever wondered what separates a prototype from a production-ready system? It’s not just about making things work—it’s about building something that can handle failure gracefully, scale when needed, and provide clear visibility when things go wrong.

Let me show you how I structure these systems. First, we establish a solid foundation with NATS JetStream. This isn’t just about sending messages—it’s about creating durable streams that persist events and ensure they’re processed exactly as intended.

// Setting up our JetStream connection with proper error handling
nc, err := nats.Connect("nats://localhost:4222",
    nats.MaxReconnects(10),
    nats.ReconnectWait(2*time.Second),
    nats.DisconnectHandler(func(c *nats.Conn) {
        log.Printf("Disconnected from NATS")
    }),
    nats.ReconnectHandler(func(c *nats.Conn) {
        log.Printf("Reconnected to %s", c.ConnectedUrl())
    }))
if err != nil {
    return fmt.Errorf("connection failed: %w", err)
}

js, err := nc.JetStream(nats.PublishAsyncMaxPending(256))
if err != nil {
    return fmt.Errorf("jetstream context failed: %w", err)
}

What happens when your service suddenly gets flooded with messages? Or when downstream services become temporarily unavailable? That’s where proper stream configuration comes into play.

// Creating a durable stream with replication
_, err = js.AddStream(&nats.StreamConfig{
    Name:        "ORDERS",
    Subjects:    []string{"order.*"},
    Retention:   nats.InterestPolicy,
    MaxMsgs:     1000000,
    MaxBytes:    1 * 1024 * 1024 * 1024, // 1GB
    Storage:     nats.FileStorage,
    Replicas:    3,
    Duplicates:  2 * time.Minute,
})

Now, let’s talk about something crucial: observability. How can you trace a request as it jumps between services? OpenTelemetry provides the answer, giving you clear visibility across your entire system.

// Instrumenting our service with OpenTelemetry
func publishOrderEvent(ctx context.Context, js nats.JetStreamContext, order Order) error {
    ctx, span := tracer.Start(ctx, "publishOrderEvent")
    defer span.End()
    
    span.SetAttributes(
        attribute.String("order.id", order.ID),
        attribute.String("order.status", order.Status),
    )
    
    eventData, err := json.Marshal(order)
    if err != nil {
        span.RecordError(err)
        return fmt.Errorf("marshal failed: %w", err)
    }
    
    msg := &nats.Msg{
        Subject: "order.created",
        Data:    eventData,
        Header:  nats.Header{},
    }
    
    // Inject tracing context into message headers
    otel.GetTextMapPropagator().Inject(ctx, natsHeaderCarrier{msg.Header})
    
    ack, err := js.PublishMsg(msg)
    if err != nil {
        span.RecordError(err)
        return fmt.Errorf("publish failed: %w", err)
    }
    
    span.SetAttributes(attribute.String("nats.sequence", ack.Sequence))
    return nil
}

Error handling is where many systems fall short. But what if we could automatically retry failed operations with exponential backoff? This approach has saved me countless times during temporary outages.

// Robust message consumption with retry logic
sub, err := js.Subscribe("order.created", func(msg *nats.Msg) {
    ctx := otel.GetTextMapPropagator().Extract(context.Background(), natsHeaderCarrier{msg.Header})
    ctx, span := tracer.Start(ctx, "processOrder")
    defer span.End()
    
    var order Order
    if err := json.Unmarshal(msg.Data, &order); err != nil {
        span.RecordError(err)
        msg.Nak() // Negative acknowledgment
        return
    }
    
    if err := processOrder(ctx, order); err != nil {
        span.RecordError(err)
        if shouldRetry(err) {
            msg.NakWithDelay(calculateBackoff(msg.Metadata().NumDelivered))
        } else {
            msg.Term() // Terminal failure
        }
        return
    }
    
    msg.Ack() // Success
}, nats.ManualAck())

Building production-ready systems means thinking about deployment and monitoring from day one. I always include health checks, metrics endpoints, and proper logging from the very beginning.

// Health check endpoint with metrics
func healthHandler(w http.ResponseWriter, r *http.Request) {
    if nc.Status() != nats.CONNECTED {
        http.Error(w, "NATS disconnected", http.StatusServiceUnavailable)
        return
    }
    
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]interface{}{
        "status":    "healthy",
        "timestamp": time.Now().UTC(),
        "nats":      nc.ConnectedUrl(),
    })
}

The beauty of this approach is how everything works together. Go’s performance characteristics, NATS JetStream’s reliability, and OpenTelemetry’s observability create a foundation you can trust in production.

But here’s something to consider: how do you ensure message ordering while maintaining scalability? The answer often lies in thoughtful partitioning and consumer group strategies.

What I love about this architecture is its resilience. Services can fail, networks can partition, and messages can be delayed—yet the system continues to operate, eventually catching up when conditions improve.

Remember, building for production means anticipating failure at every level. It’s not about preventing errors entirely—that’s impossible—but about creating systems that handle errors gracefully and recover automatically.

I’d love to hear your thoughts on this approach. What challenges have you faced with event-driven architectures? Have you found particular strategies that work well in your production environments? Share your experiences in the comments below—let’s learn from each other’s journeys.

If this resonates with you, please like and share this with others who might benefit from these patterns. Your feedback helps shape future content and discussions around building reliable systems.

Keywords: event-driven microservices Go, NATS JetStream microservices, OpenTelemetry Go observability, production microservices deployment, Go message streaming patterns, distributed tracing microservices, event-driven architecture Go, microservices error handling resilience, NATS JetStream production setup, Go microservices monitoring observability



Similar Posts
Blog Image
Boost Web App Performance: Complete Echo Framework and Redis Integration Guide for Go Developers

Learn to integrate Echo with Redis for lightning-fast web apps. Boost performance with caching, sessions & real-time features. Build scalable Go applications now!

Blog Image
Building High-Performance Go Web Apps: Echo Framework and Redis Integration Guide

Learn to integrate Echo Framework with Redis for lightning-fast Go web apps. Boost performance with caching, sessions & real-time features. Build scalable applications now!

Blog Image
Production-Ready Event-Driven Microservices: Complete Guide with NATS, MongoDB, and Go Implementation

Learn to build production-ready event-driven microservices using NATS, MongoDB & Go. Complete guide with Docker deployment, monitoring & testing strategies.

Blog Image
Complete Event-Driven Microservices Architecture with Go, NATS, and MongoDB: Production-Ready Tutorial

Learn to build scalable event-driven microservices with Go, NATS JetStream & MongoDB. Master resilient architecture, observability & deployment patterns.

Blog Image
Complete Guide to Integrating Cobra CLI Framework with Viper Configuration Management in Go Applications

Learn to integrate Cobra CLI framework with Viper configuration management in Go. Build robust command-line apps with flexible config handling from multiple sources.

Blog Image
Echo Framework Redis Integration: Build Lightning-Fast Session Management for Scalable Go Applications

Learn how to integrate Echo framework with Redis for lightning-fast session management. Boost performance, enable horizontal scaling, and build robust Go web apps.