golang

Build Production-Ready Event-Driven Microservices with Go, NATS, and OpenTelemetry: Complete Guide

Learn to build production-ready event-driven microservices using Go, NATS JetStream & OpenTelemetry. Complete tutorial with code examples, deployment & monitoring.

Build Production-Ready Event-Driven Microservices with Go, NATS, and OpenTelemetry: Complete Guide

I’ve been building distributed systems for years, and one question keeps coming up: how do we create microservices that don’t just work in development but thrive in production? This challenge led me to combine Go’s efficiency with NATS JetStream’s reliability and OpenTelemetry’s observability. Today, I want to show you how to build event-driven microservices that handle real-world complexity without falling apart.

Have you ever wondered what happens when your payment service goes down during peak traffic? Event-driven architecture with proper messaging can make your system resilient to such failures. Let’s start with the core event system that ties everything together.

// Event structure with tracing support
type Event struct {
    ID            string                 `json:"id"`
    Type          EventType              `json:"type"`
    Source        string                 `json:"source"`
    Subject       string                 `json:"subject"`
    Time          time.Time              `json:"time"`
    Data          json.RawMessage        `json:"data"`
    TraceID       string                 `json:"trace_id"`
    CorrelationID string                 `json:"correlation_id"`
}

func PublishOrderCreated(ctx context.Context, natsConn *nats.Conn, order *Order) error {
    span := trace.SpanFromContext(ctx)
    eventData := OrderCreatedData{
        OrderID:    order.ID,
        CustomerID: order.CustomerID,
        ProductID:  order.ProductID,
        Quantity:   order.Quantity,
        Amount:     order.Amount,
    }
    
    event, err := NewEvent(OrderCreatedEvent, "order-service", order.ID, eventData)
    if err != nil {
        return fmt.Errorf("failed to create event: %w", err)
    }
    
    event.TraceID = span.SpanContext().TraceID().String()
    eventBytes, _ := json.Marshal(event)
    
    return natsConn.Publish("orders.created", eventBytes)
}

Go’s concurrency model makes it perfect for handling multiple events simultaneously. The language’s built-in support for goroutines and channels aligns beautifully with event-driven patterns. But how do we ensure messages aren’t lost when services restart?

NATS JetStream provides persistent storage and exactly-once delivery semantics. Here’s how to set up a durable consumer that survives service restarts:

func SetupOrderConsumer(nc *nats.Conn) (*nats.Subscription, error) {
    js, err := nc.JetStream()
    if err != nil {
        return nil, err
    }

    // Create stream if it doesn't exist
    _, err = js.AddStream(&nats.StreamConfig{
        Name:     "ORDERS",
        Subjects: []string{"orders.*"},
    })

    // Create durable consumer
    return js.Subscribe("orders.created", handleOrderCreated, 
        nats.Durable("order-processor"),
        nats.DeliverAll(),
        nats.AckExplicit(),
    )
}

Observability isn’t just nice to have—it’s essential for production systems. When something goes wrong at 3 AM, you need to know why. OpenTelemetry gives you distributed tracing across all your services.

What if you could trace a single order through every service it touches? Here’s how to instrument a service:

func ProcessOrder(ctx context.Context, orderID string) error {
    ctx, span := tracer.Start(ctx, "ProcessOrder")
    defer span.End()

    // Add attributes to the span
    span.SetAttributes(
        attribute.String("order.id", orderID),
        attribute.Int64("start_time", time.Now().Unix()),
    )

    // Your business logic here
    err := reserveInventory(ctx, orderID)
    if err != nil {
        span.RecordError(err)
        return err
    }

    return nil
}

Circuit breakers prevent cascading failures when downstream services become unstable. The gobreaker library integrates beautifully with Go’s context pattern:

var cb *gobreaker.CircuitBreaker

func init() {
    cb = gobreaker.NewCircuitBreaker(gobreaker.Settings{
        Name:    "PaymentService",
        Timeout: 10 * time.Second,
    })
}

func ProcessPayment(ctx context.Context, payment *Payment) error {
    result, err := cb.Execute(func() (interface{}, error) {
        return paymentGateway.Charge(ctx, payment)
    })
    
    if err != nil {
        return fmt.Errorf("payment failed: %w", err)
    }
    
    return publishPaymentProcessed(ctx, result.(*PaymentResult))
}

Graceful shutdown ensures your service doesn’t drop messages when scaling down. Here’s a production-ready shutdown handler:

func main() {
    router := gin.Default()
    server := &http.Server{
        Addr:    ":8080",
        Handler: router,
    }

    go func() {
        if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("server failed: %v", err)
        }
    }()

    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
    <-quit

    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    // Stop accepting new connections
    if err := server.Shutdown(ctx); err != nil {
        log.Fatalf("forced shutdown: %v", err)
    }

    // Close NATS connection
    natsConn.Close()
    log.Println("service stopped gracefully")
}

Dead letter queues handle messages that repeatedly fail processing. This prevents your main queues from getting stuck:

func handleOrderCreated(msg *nats.Msg) {
    ctx := context.Background()
    var event Event
    
    if err := json.Unmarshal(msg.Data, &event); err != nil {
        // Move to DLQ after multiple failures
        msg.NakWithDelay(5 * time.Second) // Retry after 5 seconds
        return
    }

    // Process the event
    if err := processOrderCreation(ctx, event); err != nil {
        metrics.DLQCounter.Add(ctx, 1)
        msg.Term() // Move to DLQ
        return
    }

    msg.Ack()
}

Docker deployment with health checks ensures your services stay healthy. Here’s a sample Dockerfile and health check:

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main ./cmd

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1
CMD ["./main"]

Building production-ready systems requires thinking about failure from day one. Event-driven architecture with proper observability turns potential disasters into manageable incidents. The combination of Go’s performance, NATS’s reliability, and OpenTelemetry’s insights creates a foundation that scales with your business.

I’ve deployed systems using this approach that handle millions of events daily with minimal downtime. The investment in proper architecture pays off when you can sleep through the night knowing your services will handle whatever comes their way. If this approach resonates with you, I’d love to hear about your experiences. Please share this article with your team and leave a comment about how you’re handling microservice complexity in your projects.

Keywords: event-driven microservices Go, NATS JetStream messaging, OpenTelemetry observability, Go microservices architecture, production microservices deployment, NATS event streaming, distributed tracing Go, microservices circuit breaker, Docker microservices container, Go event sourcing patterns



Similar Posts
Blog Image
How to Build a Production-Ready Worker Pool with Graceful Shutdown in Go: Complete Guide

Learn to build a production-ready Go worker pool with graceful shutdown, panic recovery, backpressure handling, and metrics. Master concurrent programming patterns for scalable applications.

Blog Image
Apache Kafka Go Tutorial: Production-Ready Event Streaming Systems with High-Throughput Message Processing

Master Apache Kafka with Go: Build production-ready event streaming systems using Sarama & Confluent clients. Learn high-performance producers, scalable consumers & monitoring.

Blog Image
Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Master distributed tracing, resilience patterns & production deployment.

Blog Image
Building Production-Ready gRPC Services in Go: Protocol Buffers, Interceptors, and Observability Complete Guide

Learn to build production-ready gRPC services in Go with Protocol Buffers, interceptors, and observability. Complete guide with code examples and best practices.

Blog Image
How to Integrate Echo with Redis for High-Performance Session Management and Caching in Go

Learn how to integrate Echo with Redis for powerful session management and caching. Build scalable Go web apps with distributed state storage and boost performance.

Blog Image
Master Event-Driven Microservices: Go, NATS JetStream, and Kubernetes Production Guide

Learn to build production-ready event-driven microservices with Go, NATS JetStream & Kubernetes. Master distributed architecture, resilience patterns & observability.