golang

Building Event-Driven Microservices with NATS, Go, and Kubernetes: Complete Production Guide

Master building production-ready event-driven microservices with NATS, Go & Kubernetes. Complete guide with JetStream, error handling, monitoring & scaling.

Building Event-Driven Microservices with NATS, Go, and Kubernetes: Complete Production Guide

I’ve been building distributed systems for over a decade, and I still remember the first time I faced a production outage caused by tightly coupled microservices. That moment sparked my journey into event-driven architecture, and today I want to share a battle-tested approach using NATS, Go, and Kubernetes. This isn’t just theoretical—it’s the same system that now handles millions of events daily across our e-commerce platform.

Why did I choose this specific stack? NATS provides incredible performance with its simple design, Go offers the perfect balance of productivity and control, while Kubernetes gives us the operational maturity needed for production workloads. Together, they create a foundation that scales predictably and handles failures gracefully.

Let me show you how we structure our events. Every message in our system follows a consistent format that includes metadata for tracing and correlation. This consistency pays dividends when debugging distributed workflows.

type EventMetadata struct {
    ID            string            `json:"id"`
    Type          string            `json:"type"`
    Source        string            `json:"source"`
    Timestamp     time.Time         `json:"timestamp"`
    CorrelationID string            `json:"correlation_id"`
}

func NewOrderCreatedEvent(orderID string, items []OrderItem) BaseEvent {
    metadata := NewEventMetadata("order.created", "order-service")
    data := OrderCreatedData{
        OrderID:    orderID,
        Items:      items,
        CreatedAt:  time.Now().UTC(),
    }
    return BaseEvent{Metadata: metadata, Data: data}
}

Have you ever wondered how to ensure messages aren’t lost during network partitions? That’s where NATS JetStream comes in. Unlike traditional message queues, JetStream provides persistence without sacrificing performance. We run it as a three-node cluster in Kubernetes for high availability.

Our NATS configuration focuses on reliability. We set appropriate memory and disk limits, configure cluster routing for redundancy, and use separate accounts for different service types. This isolation prevents one noisy service from affecting others.

jetstream: {
    store_dir: "/data/jetstream"
    max_mem: 1G
    max_file: 10G
}

cluster: {
    name: "ecommerce-cluster"
    routes: [
        "nats://nats-0.nats.default.svc.cluster.local:6222"
        "nats://nats-1.nats.default.svc.cluster.local:6222"
    ]
}

Building the messaging client was a learning experience. I initially underestimated the importance of proper connection handling. Now, our client includes comprehensive reconnection logic and proper resource cleanup.

func NewNATSClient(config NATSConfig) (*NATSClient, error) {
    opts := []nats.Option{
        nats.MaxReconnects(5),
        nats.ReconnectWait(2 * time.Second),
        nats.DisconnectErrHandler(func(nc *nats.Conn, err error) {
            slog.Error("NATS disconnected", "error", err)
        }),
    }
    
    nc, err := nats.Connect(nats.DefaultURL, opts...)
    if err != nil {
        return nil, fmt.Errorf("connect failed: %w", err)
    }
    
    js, err := jetstream.New(nc)
    if err != nil {
        return nil, fmt.Errorf("jetstream init failed: %w", err)
    }
    
    return &NATSClient{conn: nc, js: js}, nil
}

What separates production-ready services from prototypes? Error handling and observability. We instrument every service with OpenTelemetry for distributed tracing and structured logging. When something goes wrong—and it will—we can trace the entire flow across service boundaries.

Our consumer services use durable consumers with explicit acknowledgments. This pattern ensures messages are processed exactly once, even when services restart. We’ve found that choosing the right acknowledgment mode depends on your delivery guarantees.

func (s *OrderService) processPaymentEvents(ctx context.Context) error {
    consumer, err := s.js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
        Durable:   "order-payment-processor",
        AckPolicy: jetstream.AckExplicitPolicy,
    })
    
    messages, err := consumer.Messages()
    for msg := range messages {
        if err := s.handlePaymentMessage(msg); err != nil {
            slog.Error("Failed processing message", "error", err)
            continue
        }
        msg.Ack()
    }
    return nil
}

Kubernetes deployment taught us valuable lessons about resource management. We use horizontal pod autoscaling based on NATS queue length and CPU usage. Each service includes liveness and readiness probes that check both the application health and NATS connection status.

Monitoring event-driven systems requires different thinking. Instead of watching request rates, we monitor event throughput, processing latency, and dead letter queues. We built custom Grafana dashboards that show the entire event flow—from producers through to consumers.

How do you test such asynchronous systems? We combine unit tests for business logic with integration tests that spin up real NATS servers. Our test containers include scenarios for network partitions and service failures to ensure our retry mechanisms work as expected.

After several production deployments, I can share some hard-earned wisdom. Always start with simple retry logic before implementing complex patterns. Use correlation IDs religiously—they’re worth their weight in gold during incident investigation. And never underestimate the power of good documentation for your event schemas.

The beauty of this architecture shines during peak loads. When holiday traffic hit our platform last year, the system scaled seamlessly because events buffered naturally in NATS, preventing cascading failures. Services could process at their own pace without dropping requests.

I’m passionate about this approach because it creates systems that are both robust and understandable. The clear separation between services, combined with well-defined events, makes the entire system easier to maintain and evolve over time.

If this guide helps you build better systems, I’d love to hear about your experiences. Please share this with colleagues who might benefit, and leave a comment about your own event-driven journey. Your insights could help others in our community navigate similar challenges.

Keywords: event-driven microservices, NATS JetStream, Go microservices Kubernetes, microservices architecture patterns, Kubernetes deployment microservices, distributed systems Go, NATS messaging tutorial, microservices observability monitoring, Go event-driven programming, production microservices guide



Similar Posts
Blog Image
How to Integrate Cobra with Viper for Advanced CLI Configuration in Go Applications

Learn how to integrate Cobra with Viper in Go to build advanced CLI applications with multi-source configuration support, dynamic updates, and cleaner code.

Blog Image
Build Production-Ready Event-Driven Microservices with Go, NATS JetStream, and OpenTelemetry Guide

Learn to build scalable event-driven microservices with Go, NATS JetStream & OpenTelemetry. Master concurrency, observability & resilience patterns.

Blog Image
Build Production-Ready Event-Driven Microservices with Go, NATS JetStream and OpenTelemetry Guide

Learn to build production-ready event-driven microservices using Go, NATS JetStream & OpenTelemetry. Master message patterns, observability & resilient architecture.

Blog Image
Building Production-Ready Event-Driven Microservices with NATS JetStream and Go: Complete Tutorial

Learn to build scalable event-driven microservices with NATS JetStream and Go. Complete guide covering architecture, implementation, testing, and production deployment with real-world examples.

Blog Image
Fiber and Viper Integration: Build High-Performance Go Apps with Dynamic Configuration Management

Build high-performance Go apps with Fiber and Viper. Learn to integrate flexible configuration management with blazing-fast web services for scalable applications.

Blog Image
Master Cobra-Viper Integration: Build Advanced Go CLI Apps with Seamless Multi-Source Configuration Management

Learn to integrate Cobra with Viper for powerful Go CLI apps with seamless multi-source configuration management from files, environment variables, and flags.