Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Tutorial

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Tutorial

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Complete guide with tracing, monitoring & production deployment.

Aug 6, 2025

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Tutorial

Have you ever struggled with building distributed systems that remain reliable under pressure? I recently faced this challenge while designing a critical order processing system. The complexities of coordinating microservices, ensuring data consistency, and maintaining visibility across components kept me awake at night. That’s when I decided to create a battle-tested approach using Go, NATS JetStream, and OpenTelemetry – tools that transform theoretical distributed systems concepts into production-ready solutions.

Setting up our project requires thoughtful organization. I prefer this directory structure for clear separation of concerns:

order-processing-system/
├── cmd/
│   ├── order-service/
│   ├── inventory-service/
│   ├── payment-service/
│   └── notification-service/
├── internal/
│   ├── events/
│   ├── tracing/
│   ├── metrics/
│   └── messaging/
├── pkg/
│   ├── models/
│   └── utils/
├── deployments/
│   ├── docker/
│   └── k8s/
└── go.mod

Initialize dependencies with:

go mod init order-processing-system
go get github.com/nats-io/nats.go
go get go.opentelemetry.io/otel
go get github.com/prometheus/client_golang
go get github.com/sony/gobreaker

Well-defined events form our system’s backbone. Notice how each event captures essential context and tracing metadata:

// pkg/models/events.go
type Event struct {
    ID          string                 `json:"id"`
    Type        EventType              `json:"type"`
    AggregateID string                 `json:"aggregate_id"`
    Version     int                    `json:"version"`
    Data        json.RawMessage        `json:"data"`
    Metadata    map[string]interface{} `json:"metadata"`
    Timestamp   time.Time              `json:"timestamp"`
    TraceID     string                 `json:"trace_id"`
    SpanID      string                 `json:"span_id"`
}

type OrderCreatedEvent struct {
    OrderID     string  `json:"order_id"`
    CustomerID  string  `json:"customer_id"`
    ProductID   string  `json:"product_id"`
    Quantity    int     `json:"quantity"`
}

How do we ensure messages survive failures? NATS JetStream provides persistence and guaranteed delivery. This setup creates durable streams:

// internal/messaging/nats_client.go
func (nc *NATSClient) setupStream(streamName string, subjects []string) error {
    _, err := nc.js.StreamInfo(streamName)
    if err == nil {
        return nil // Stream exists
    }
    
    _, err = nc.js.AddStream(&nats.StreamConfig{
        Name:      streamName,
        Subjects:  subjects,
        Storage:   nats.FileStorage,
        Replicas:  1,
        MaxAge:    24 * time.Hour,
        Retention: nats.WorkQueuePolicy,
    })
    return err
}

When publishing events, we propagate trace context automatically:

func (nc *NATSClient) PublishEvent(ctx context.Context, subject string, event models.Event) error {
    headers := nats.Header{}
    nc.propagator.Inject(ctx, propagation.HeaderCarrier(headers))
    
    data, _ := json.Marshal(event)
    _, err := nc.js.Publish(subject, data, nats.MsgHeader(headers))
    return err
}

What happens when downstream services fail? The circuit breaker pattern prevents cascading failures:

// internal/resilience/circuit_breaker.go
func NewPaymentClient() *gobreaker.CircuitBreaker {
    return gobreaker.NewCircuitBreaker(gobreaker.Settings{
        Name:        "PaymentService",
        MaxRequests: 5,
        Interval:    30 * time.Second,
        Timeout:     10 * time.Second,
        ReadyToTrip: func(counts gobreaker.Counts) bool {
            return counts.ConsecutiveFailures > 10
        },
    })
}

OpenTelemetry gives us cross-service visibility. This initializes tracing:

// internal/tracing/tracing.go
func InitTracing(serviceName string) (func(), error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint())
    if err != nil {
        return nil, err
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String(serviceName),
        ),
    )
    otel.SetTracerProvider(tp)
    return func() { _ = tp.Shutdown(context.Background()) }, nil
}

For metrics collection, we integrate directly with Prometheus:

// internal/metrics/metrics.go
func RegisterMetrics() *prometheus.Registry {
    registry := prometheus.NewRegistry()
    registry.MustRegister(prometheus.NewGoCollector())
    registry.MustRegister(prometheus.NewProcessCollector(
        prometheus.ProcessCollectorOpts{}))
    return registry
}

When processing messages, we handle duplicates using idempotency keys:

// internal/events/processor.go
func (p *EventProcessor) Process(ctx context.Context, msg *nats.Msg) {
    var event models.Event
    if err := json.Unmarshal(msg.Data, &event); err != nil {
        p.logger.Error("Invalid message format")
        return
    }

    if p.store.IsProcessed(event.ID) {
        msg.Ack()
        return // Skip duplicate
    }

    // Process event logic
    p.store.MarkProcessed(event.ID)
    msg.Ack()
}

Deploying to Kubernetes? Our deployment manifests ensure proper scaling:

# deployments/k8s/order-service.yaml
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: order-service
        image: order-service:1.2.0
        env:
        - name: NATS_URL
          value: nats://nats-jetstream:4222
        - name: OTEL_EXPORTER_JAEGER_ENDPOINT
          value: http://jaeger-collector:14268/api/traces

Building production-grade systems requires addressing failure scenarios upfront. What happens when payment gateways time out? How do we trace requests across four services? By combining Go’s concurrency with JetStream’s persistence and OpenTelemetry’s observability, we create systems that withstand real-world chaos. The patterns shown here – idempotent processing, circuit breakers, and distributed tracing – provide the foundation.

What challenges have you faced with microservices? I’d love to hear your experiences in the comments. If this approach resonates with you, share it with others building resilient systems!

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Tutorial

Our Creations

We are on Medium

Similar Posts

Build Production-Ready Event-Driven Microservices with NATS, Go, and Complete Observability Guide

Build High-Performance Go Applications: Complete Fiber and Redis Session Management Integration Guide

Master Event-Driven Microservices: Go, NATS JetStream, and Kubernetes Production Guide

Building Production-Ready Event-Driven Microservices: Go, NATS JetStream, MongoDB Complete Tutorial

Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry: Complete Tutorial

How to Build a Distributed In-Process Cache in Go for Scalable APIs