Master Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

golang

Master Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

Learn to build production-ready event-driven microservices with Go, NATS JetStream & OpenTelemetry. Master distributed tracing, resilience patterns & testing.

Aug 22, 2025

Master Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

I’ve been thinking a lot about how modern applications handle scale and complexity. It’s not just about writing code that works—it’s about building systems that remain understandable, maintainable, and reliable as they grow. That’s why I want to share my approach to creating production-ready event-driven microservices using Go, NATS JetStream, and OpenTelemetry.

Why focus on these technologies? Go gives us performance and simplicity, NATS JetStream provides reliable messaging, and OpenTelemetry offers the visibility we need in distributed systems. Together, they form a powerful foundation for building systems that can handle real-world demands.

Let’s start with the core concept: events become the communication backbone between services. Instead of services calling each other directly, they publish events that other services can consume. This approach reduces coupling and makes our system more resilient to failures.

How do we ensure these events are handled reliably? That’s where NATS JetStream comes in. It provides persistent storage for messages, ensuring they’re not lost if a service goes down temporarily. Here’s a basic setup:

nc, err := nats.Connect("nats://localhost:4222")
if err != nil {
    log.Fatal().Err(err).Msg("Failed to connect to NATS")
}

js, err := nc.JetStream()
if err != nil {
    log.Fatal().Err(err).Msg("Failed to get JetStream context")
}

// Create a stream if it doesn't exist
_, err = js.AddStream(&nats.StreamConfig{
    Name:     "USER_EVENTS",
    Subjects: []string{"user.>"},
})

But what happens when things go wrong? How do we track a request as it moves through multiple services? OpenTelemetry provides distributed tracing that gives us this visibility. Implementing it in Go is straightforward:

func initTracer() (*sdktrace.TracerProvider, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithSampler(sdktrace.AlwaysSample()),
        sdktrace.WithBatcher(exporter),
    )
    otel.SetTracerProvider(tp)
    return tp, nil
}

Now, when we publish an event, we can include tracing information:

func (b *NATSEventBus) Publish(ctx context.Context, subject string, event proto.Message) error {
    span := trace.SpanFromContext(ctx)
    headers := make(map[string]string)
    
    // Inject tracing headers
    carrier := propagation.MapCarrier(headers)
    otel.GetTextMapPropagator().Inject(ctx, carrier)
    
    data, err := proto.Marshal(event)
    if err != nil {
        return fmt.Errorf("failed to marshal event: %w", err)
    }

    _, err = b.js.Publish(subject, data, nats.MsgHeader(headers))
    return err
}

Error handling is crucial in distributed systems. What if a service fails to process an event? We need retry mechanisms and circuit breakers to prevent cascading failures. The gobreaker package helps implement circuit breaker patterns:

cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:        "user-service",
    Timeout:     30 * time.Second,
    MaxRequests: 5,
    ReadyToTrip: func(counts gobreaker.Counts) bool {
        return counts.ConsecutiveFailures > 5
    },
})

_, err := cb.Execute(func() (interface{}, error) {
    return userService.CreateUser(ctx, user)
})

Testing event-driven systems requires a different approach. We need to verify that events are published and processed correctly. Testcontainers can help by spinning up real infrastructure for integration tests:

func TestUserCreationPublishesEvent(t *testing.T) {
    ctx := context.Background()
    
    // Start NATS container
    natsContainer, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        ContainerRequest: testcontainers.ContainerRequest{
            Image:        "nats:jetstream",
            ExposedPorts: []string{"4222/tcp"},
        },
    })
    
    // Test that event is published when user is created
    userID, err := userService.CreateUser(testUser)
    require.NoError(t, err)
    
    // Verify event was published
    msgs, err := js.SubscribeSync("user.created")
    require.NoError(t, err)
    
    msg, err := msgs.NextMsg(5 * time.Second)
    require.NoError(t, err)
    
    var event events.UserCreatedEvent
    err = proto.Unmarshal(msg.Data, &event)
    require.NoError(t, err)
    assert.Equal(t, userID, event.UserId)
}

Deployment considerations are equally important. How do we ensure our services can handle rolling updates without losing messages? Kubernetes deployments with proper readiness probes and graceful shutdown handling are essential:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: user-service
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 30"]

Monitoring production systems requires more than just logs. We need metrics that tell us about message processing rates, error rates, and latency. Prometheus metrics integrated with OpenTelemetry give us this visibility:

func initMeter() (*metric.MeterProvider, error) {
    exporter, err := prometheus.New()
    if err != nil {
        return nil, err
    }

    mp := metric.NewMeterProvider(metric.WithReader(exporter))
    return mp, nil
}

// Track processing latency
processingTime := metric.Must(meter).NewFloat64Histogram(
    "event_processing_seconds",
    metric.WithDescription("Time spent processing events"),
)

Building production-ready event-driven systems is challenging but rewarding. The combination of Go’s efficiency, NATS JetStream’s reliability, and OpenTelemetry’s observability creates a solid foundation for scalable applications.

What challenges have you faced with event-driven architectures? I’d love to hear about your experiences and solutions. If you found this helpful, please share it with others who might benefit from these patterns. Your comments and questions are always welcome—let’s keep the conversation going about building better distributed systems.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Master Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Guide

Our Creations

We are on Medium

Similar Posts

How to Build a Scalable Peer-to-Peer Caching System with Groupcache

Echo Framework Redis Integration: Build Lightning-Fast Session Management in Go

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Guide

Complete Guide to Integrating Cobra CLI with Viper Configuration Management in Go Applications

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Fiber + Redis Integration: Build Lightning-Fast Go Web Applications with Advanced Caching