Building High-Performance Event-Driven Microservices with Go NATS JetStream and OpenTelemetry Tracing

golang

Building High-Performance Event-Driven Microservices with Go NATS JetStream and OpenTelemetry Tracing

Learn to build scalable event-driven microservices with Go, NATS JetStream & distributed tracing. Master event sourcing, observability & production patterns.

Jul 25, 2025

Building High-Performance Event-Driven Microservices with Go NATS JetStream and OpenTelemetry Tracing

I’ve been thinking about distributed systems lately. Why do so many struggle with performance as they scale? The challenges I faced while building a global e-commerce platform inspired me to explore high-performance event-driven architectures. Today I’ll share how to build resilient microservices using Go, NATS JetStream, and distributed tracing - a combination I’ve found particularly effective in production environments.

When designing our architecture, we need to consider how services communicate without creating bottlenecks. NATS JetStream provides the backbone with its persistent messaging capabilities. The core structure connects user, order, and notification services through a central event bus. Each service handles its domain logic while publishing events for others to consume. This separation allows independent scaling and deployment. What happens if a service goes offline temporarily? JetStream’s persistence ensures no events are lost.

Setting up our messaging infrastructure requires careful configuration. Here’s a minimal JetStream setup in Docker:

// Start NATS cluster
docker run -d --name nats-1 -p 4222:4222 -v /data/jetstream:/data \
  nats:latest -js -cluster nats://0.0.0.0:6222 -routes nats://nats-1:6222,nats://nats-2:6222

Connecting our Go services to JetStream involves establishing reliable connections with automatic reconnects. Notice how we handle disconnections gracefully:

func NewEventBus(urls []string) (*EventBus, error) {
  opts := []nats.Option{
    nats.MaxReconnects(10),
    nats.ReconnectWait(2*time.Second),
    nats.DisconnectErrHandler(func(_ *nats.Conn, err error) {
      log.Printf("Disconnected: %v", err)
    }),
  }
  
  nc, _ := nats.Connect(strings.Join(urls, ","), opts...)
  js, _ := nc.JetStream(nats.PublishAsyncMaxPending(1000))
  
  return &EventBus{nc: nc, js: js}, nil
}

For our user service, event sourcing becomes crucial. We store state changes as a sequence of events. When a user updates their profile, we publish an event rather than directly updating a database. This approach has interesting implications - how might we rebuild state if needed? By replaying events from the beginning. Here’s how we handle user updates:

func (s *UserService) UpdateProfile(ctx context.Context, update ProfileUpdate) error {
  event := events.UserUpdated{
    UserID:      update.UserID,
    NewName:     update.Name,
    ChangedAt:   time.Now(),
  }
  
  if err := s.eventBus.Publish("USER_UPDATED", event); err != nil {
    return fmt.Errorf("publishing failed: %w", err)
  }
  
  return nil
}

Distributed tracing solves visibility challenges across service boundaries. Using OpenTelemetry, we propagate trace contexts through events:

// Inject tracing context into event metadata
carrier := propagation.MapCarrier{}
otel.GetTextMapPropagator().Inject(ctx, carrier)

event.Metadata = make(map[string]string)
for k, v := range carrier {
  event.Metadata[k] = v
}

// Extract at consumer side
ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(event.Metadata))

For high-throughput scenarios, Go’s concurrency patterns shine. We use worker pools to process events efficiently. How many workers should we spawn? The sweet spot depends on your workload, but start with GOMAXPROCS multiplied by 2:

func (s *OrderService) StartWorkers(ctx context.Context) {
  for i := 0; i < runtime.NumCPU()*2; i++ {
    go s.worker(ctx)
  }
}

func (s *OrderService) worker(ctx context.Context) {
  sub, _ := s.js.PullSubscribe("ORDERS.*", "order-workers")
  for {
    msgs, _ := sub.Fetch(10, nats.MaxWait(5*time.Second))
    for _, msg := range msgs {
      s.processOrder(msg)
      msg.Ack()
    }
  }
}

Health checks are non-negotiable in distributed systems. We implement both liveness and readiness probes:

http.HandleFunc("/health/live", func(w http.ResponseWriter, _ *http.Request) {
  w.WriteHeader(http.StatusOK) // Simple process check
})

http.HandleFunc("/health/ready", func(w http.ResponseWriter, _ *http.Request) {
  if s.eventBus.Ready() { // Check NATS connection
    w.WriteHeader(http.StatusOK)
  } else {
    w.WriteHeader(http.StatusServiceUnavailable)
  }
})

In production, we monitor key metrics like event processing latency and error rates. Prometheus instrumentation gives us real-time insights:

var (
  eventsProcessed = prometheus.NewCounterVec(prometheus.CounterOpts{
    Name: "events_processed_total",
    Help: "Total processed events",
  }, []string{"service", "type"})
  
  processingTime = prometheus.NewHistogramVec(prometheus.HistogramOpts{
    Name: "event_processing_seconds",
    Help: "Event processing time",
  }, []string{"service"})
)

func init() {
  prometheus.MustRegister(eventsProcessed, processingTime)
}

Deployment considerations include resource limits and proper shutdown handling. Always drain JetStream connections during graceful termination:

quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)

<-quit
if err := s.eventBus.Drain(); err != nil {
  log.Printf("Draining failed: %v", err)
}

This approach has served me well in high-load environments processing millions of events daily. The combination of Go’s efficiency, JetStream’s reliability, and distributed tracing provides a solid foundation. What challenges have you faced with event-driven architectures? Share your experiences below - I’d love to hear different perspectives. If you found this useful, please consider sharing it with others who might benefit.

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

golang

Building High-Performance Event-Driven Microservices with Go NATS JetStream and OpenTelemetry Tracing

Our Creations

We are on Medium

Similar Posts

Production-Ready Event-Driven Microservices with NATS, Go, and Kubernetes: Complete Build Guide

Master Cobra and Viper Integration: Build Professional Go CLI Tools with Advanced Configuration Management

Apache Kafka Go Tutorial: Production-Ready Event Streaming Systems with High-Throughput Message Processing

Master Cobra and Viper Integration: Build Professional CLI Tools with Advanced Configuration Management

Building Production-Ready Event-Driven Microservices with NATS, Go, and Kubernetes: Complete Tutorial

Echo Redis Integration Guide: Build Lightning-Fast Go Web Apps with Advanced Caching