Building Production Event-Driven Microservices: Go, NATS JetStream, OpenTelemetry Guide

golang

Building Production Event-Driven Microservices: Go, NATS JetStream, OpenTelemetry Guide

Learn to build production-ready event-driven microservices with Go, NATS JetStream, and OpenTelemetry. Complete guide with observability, error handling, and deployment best practices.

Aug 31, 2025

Building Production Event-Driven Microservices: Go, NATS JetStream, OpenTelemetry Guide

I’ve been thinking about building microservices that can handle real-world scale and complexity. The challenge isn’t just making services work—it’s making them work reliably under pressure, with clear visibility into what’s happening. That’s why I want to walk through creating a production-ready event-driven system using Go, NATS JetStream, and OpenTelemetry.

Let me show you how to build something that doesn’t just run, but runs well under real conditions.

We start by setting up our project structure. Organization matters when you’re dealing with multiple services that need to work together. I prefer keeping things modular but connected.

mkdir -p cmd/{order,inventory,notification}-service internal/{domain,config} pkg/events
go mod init event-microservice

Configuration is where many projects stumble early. I’ve learned to build it right from the start. Here’s how I handle configuration management:

type Config struct {
    NATSURL        string        `mapstructure:"nats_url"`
    ServiceName    string        `mapstructure:"service_name"`
    Timeout        time.Duration `mapstructure:"timeout"`
}

func LoadConfig() (*Config, error) {
    viper.AutomaticEnv()
    viper.SetDefault("nats_url", "nats://localhost:4222")
    
    var cfg Config
    if err := viper.Unmarshal(&cfg); err != nil {
        return nil, fmt.Errorf("config load failed: %w", err)
    }
    return &cfg, nil
}

Have you ever wondered how to ensure your events are both flexible and type-safe? This approach has served me well in production:

type Event struct {
    ID        string    `json:"id"`
    Type      string    `json:"type"`
    Timestamp time.Time `json:"timestamp"`
    Data      []byte    `json:"data"`
}

func NewOrderCreatedEvent(orderID string) (Event, error) {
    data, err := json.Marshal(map[string]string{"order_id": orderID})
    if err != nil {
        return Event{}, err
    }
    
    return Event{
        ID:        uuid.New().String(),
        Type:      "order.created",
        Timestamp: time.Now(),
        Data:      data,
    }, nil
}

Setting up NATS JetStream properly makes all the difference between a system that works and one that works reliably. Here’s how I configure the stream:

func SetupJetStream(nc *nats.Conn) error {
    js, err := nc.JetStream()
    if err != nil {
        return err
    }

    _, err = js.AddStream(&nats.StreamConfig{
        Name:     "ORDERS",
        Subjects: []string{"orders.*"},
        MaxAge:   time.Hour * 24,
    })
    return err
}

What separates good observability from great observability? It’s not just collecting data—it’s making that data meaningful. OpenTelemetry gives us that power:

func InitTracer(serviceName string) (func(), error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint())
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.ServiceName(serviceName),
        )),
    )
    
    otel.SetTracerProvider(tp)
    return func() { tp.Shutdown(context.Background()) }, nil
}

Error handling in event-driven systems requires careful thought. It’s not just about catching errors—it’s about designing for resilience:

func ProcessOrder(ctx context.Context, msg *nats.Msg) {
    ctx, span := tracer.Start(ctx, "process_order")
    defer span.End()

    var event Event
    if err := json.Unmarshal(msg.Data, &event); err != nil {
        span.RecordError(err)
        log.Printf("Failed to unmarshal event: %v", err)
        return
    }

    // Business logic here
    if err := handleOrder(event); err != nil {
        span.RecordError(err)
        if shouldRetry(err) {
            msg.Nak() // Request redelivery
        } else {
            msg.Term() // Permanent failure
        }
        return
    }
    
    msg.Ack()
}

Building the actual service endpoints means thinking about both the HTTP layer and the event processing. They need to work together seamlessly:

func main() {
    cfg := config.Load()
    tracerCleanup := initTracing(cfg.ServiceName)
    defer tracerCleanup()

    nc := connectNATS(cfg.NATSURL)
    js := setupJetStream(nc)
    
    // Subscribe to events
    sub, _ := js.Subscribe("orders.*", processOrder)
    defer sub.Unsubscribe()

    // Start HTTP server
    r := gin.New()
    r.Use(otelgin.Middleware(cfg.ServiceName))
    r.POST("/orders", createOrderHandler)
    
    go func() {
        if err := r.Run(fmt.Sprintf(":%d", cfg.Port)); err != nil {
            log.Fatal(err)
        }
    }()

    <-waitForShutdown()
}

The real test comes when we need to handle concurrent message processing. How do we balance performance with reliability?

func StartWorkers(js nats.JetStreamContext, numWorkers int) {
    for i := 0; i < numWorkers; i++ {
        go func(workerID int) {
            sub, err := js.QueueSubscribe("orders.*", "order-workers", processMessage)
            if err != nil {
                log.Printf("Worker %d failed to subscribe: %v", workerID, err)
                return
            }
            <-time.After(time.Hour) // Simple keep-alive
            sub.Unsubscribe()
        }(i)
    }
}

Deployment considerations often get overlooked until it’s too late. Here’s a simple Docker setup that works well:

FROM golang:1.21-alpine
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o order-service ./cmd/order-service
EXPOSE 8080
CMD ["./order-service"]

Monitoring isn’t an afterthought—it’s part of the design. Prometheus integration gives us the metrics we need:

func initMetrics() {
    exporter, err := prometheus.New()
    if err != nil {
        log.Fatal(err)
    }
    
    meterProvider := metric.NewMeterProvider(metric.WithReader(exporter))
    global.SetMeterProvider(meterProvider)
    
    meter := global.Meter("order-service")
    ordersProcessed, _ := meter.Int64Counter("orders_processed_total")
    // Use counter in your processing logic
}

Testing event-driven systems requires thinking about both unit tests and integration tests. Mocking the NATS connection helps isolate functionality:

func TestOrderProcessing(t *testing.T) {
    mockJS := new(MockJetStream)
    mockJS.On("Publish", mock.Anything, mock.Anything).Return(nil)
    
    event := Event{Type: "order.created"}
    err := processOrder(context.Background(), mockJS, event)
    
    assert.NoError(t, err)
    mockJS.AssertCalled(t, "Publish", "orders.created", mock.Anything)
}

Building production-ready systems means thinking about everything from the code to the deployment. It’s not just about making it work—it’s about making it work well, consistently, and observably.

The patterns I’ve shared here come from real experience building systems that handle significant load while remaining maintainable and observable. They represent approaches that have proven themselves in production environments.

What challenges have you faced when building event-driven systems? I’d love to hear about your experiences and solutions. If you found this useful, please share it with others who might benefit, and feel free to leave comments or questions below.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building Production Event-Driven Microservices: Go, NATS JetStream, OpenTelemetry Guide

Our Creations

We are on Medium

Similar Posts

Echo Redis Integration Guide: Boost Web Application Performance with Caching and Session Management

How to Build a Distributed Cache in Go Using Groupcache

Production-Ready Event-Driven Microservices: Go, NATS JetStream, and OpenTelemetry Complete Guide

Production-Ready Event-Driven Microservices: Build with NATS, Go, and Kubernetes for Scale

Building Production-Ready Worker Pools with Graceful Shutdown in Go: A Complete Concurrency Guide

Production-Ready gRPC Microservices with Go: Complete Service Communication, Error Handling and Observability Guide