Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build scalable event-driven microservices using Go, NATS JetStream & OpenTelemetry. Complete guide with observability, patterns & deployment.

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

I’ve been thinking about microservices a lot lately. While building systems at scale, I noticed how quickly traditional request-response architectures become bottlenecks. This challenge led me to explore event-driven patterns with Go, NATS JetStream, and OpenTelemetry. Why these tools? Go’s concurrency model fits distributed systems perfectly, JetStream ensures reliable messaging, and OpenTelemetry provides the visibility we desperately need in complex environments. Let me share what I’ve learned about building production-ready systems with this stack.

Setting up a solid foundation is critical. I start with a mono-repo structure that keeps services decoupled but manageable. The internal directory houses shared packages - configuration handling, event definitions, and observability tools. Have you considered how configuration changes might affect multiple services simultaneously? Our config loader handles environment variables cleanly:

// internal/config/config.go
type Config struct {
    ServiceName    string        `envconfig:"SERVICE_NAME" required:"true"`
    NATSUrl        string        `envconfig:"NATS_URL" default:"nats://localhost:4222"`
    // ... other fields
}

func Load() (*Config, error) {
    var cfg Config
    err := envconfig.Process("", &cfg)
    return &cfg, err
}

Observability isn’t optional in distributed systems. Implementing OpenTelemetry early pays dividends during debugging. This setup captures traces and metrics simultaneously:

// internal/observability/telemetry.go
func NewTelemetry(serviceName, jaegerEndpoint string) (*Telemetry, error) {
    res := resource.NewWithAttributes(semconv.SchemaURL, 
        semconv.ServiceName(serviceName))
    
    jaegerExporter, _ := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint(jaegerEndpoint)))
    
    tracerProvider := trace.NewTracerProvider(
        trace.WithBatcher(jaegerExporter),
        trace.WithResource(res))
    
    prometheusExporter, _ := prometheus.New()
    meterProvider := metric.NewMeterProvider(
        metric.WithReader(prometheusExporter))
    
    return &Telemetry{
        TracerProvider: tracerProvider,
        MeterProvider:  meterProvider,
    }, nil
}

NATS JetStream became my messaging backbone because it handles both streaming and persistence without complex infrastructure. Initializing the JetStream connection looks like this:

nc, _ := nats.Connect(cfg.NATSUrl)
js, _ := jetstream.New(nc)

// Create durable consumer
stream, _ := js.CreateStream(ctx, jetstream.StreamConfig{
    Name:     "ORDERS",
    Subjects: []string{"orders.*"},
})

consumer, _ := stream.CreateOrUpdateConsumer(ctx, jetstream.ConsumerConfig{
    Durable:   "ORDER_PROCESSOR",
    AckPolicy: jetstream.AckExplicitPolicy,
})

For the order service, I used Gin for HTTP handling with validator integration. Notice how we start spans automatically for each request:

// cmd/order-service/main.go
r := gin.Default()
r.POST("/orders", func(c *gin.Context) {
    ctx, span := otel.Tracer("order").Start(c.Request.Context(), "CreateOrder")
    defer span.End()
    
    var order Order
    if err := c.ShouldBindJSON(&order); err != nil {
        c.JSON(400, gin.H{"error": err.Error()})
        return
    }
    
    // Publish event
    msg, _ := json.Marshal(orderCreatedEvent(order))
    js.Publish(ctx, "orders.created", msg)
})

What happens when dependencies fail? The circuit breaker pattern prevents cascading failures. We use gobreaker with configurable thresholds:

cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:        "PaymentService",
    Timeout:     cfg.CircuitBreakerTimeout,
    MaxRequests: cfg.CircuitBreakerThreshold,
    Interval:    cfg.CircuitBreakerInterval,
    ReadyToTrip: func(counts gobreaker.Counts) bool {
        return counts.ConsecutiveFailures > 5
    },
})

// Usage
result, err := cb.Execute(func() (interface{}, error) {
    return paymentClient.Process(ctx, payment)
})

For deployment, each service gets a Dockerfile with multi-stage builds and health checks. The health endpoint supports Kubernetes liveness probes:

// pkg/health/handler.go
r.GET("/health", func(c *gin.Context) {
    if natsStatus && dbStatus {
        c.JSON(200, gin.H{"status": "OK"}) 
    } else {
        c.JSON(503, gin.H{"status": "Unhealthy"})
    }
})

Graceful shutdown ensures we don’t lose in-flight messages during deployments. This pattern handles signals and cleans up resources:

// pkg/shutdown/shutdown.go
func HandleShutdown(signals []os.Signal, cleanup func()) {
    sigCh := make(chan os.Signal, 1)
    signal.Notify(sigCh, signals...)
    <-sigCh
    cleanup()
}

// In main
go shutdown.HandleShutdown(
    []os.Signal{syscall.SIGINT, syscall.SIGTERM},
    func() {
        jsConn.Close()
        telemetry.Cleanup()
    })

What surprised me most was how well these patterns complement each other. The event-driven approach reduces coupling, JetStream ensures message durability, and OpenTelemetry provides the visibility needed to operate confidently. Each service becomes independently deployable and scalable. For those starting out, focus on the observability first - you’ll thank yourself during the first production incident.

This journey transformed how I build systems. If you’ve faced similar challenges or have questions about implementation details, share your thoughts below. Found this useful? Like and share to help others discover these patterns. What patterns have worked well in your distributed systems?

// Our Network

More from our team

Explore our publications across finance, culture, tech, and beyond.

// More Articles

Similar Posts