I’ve been thinking about microservice reliability lately. After seeing too many systems fail under pressure, I set out to build something better - an event-driven architecture that handles real-world chaos. Why? Because today’s users expect flawless transactions, even when services crash or networks fail. That’s why I built a production-ready order processing system using Go, NATS JetStream, and OpenTelemetry. Stick with me to see how it works.
Getting started requires careful project organization. I initialize a Go module and create this structure:
go mod init github.com/yourname/event-driven-microservice
The directory layout separates concerns clearly - cmd
for services, internal
for shared logic, and proto
for our contract definitions. This separation proves invaluable as the system grows.
For communication between services, we need a shared language. Protocol Buffers provide that contract:
message OrderCreated {
string order_id = 1;
string customer_id = 2;
repeated OrderItem items = 3;
double total_amount = 4;
}
Generating Go code from this definition ensures all services speak the same language. Ever tried debugging serialization errors in production? This prevents those nightmares.
Configuration comes next. I use environment variables for flexibility:
type Config struct {
NATSUrl string `envconfig:"NATS_URL" default:"nats://localhost:4222"`
WorkerPoolSize int `envconfig:"WORKER_POOL_SIZE" default:"10"`
}
Why hardcode when you can adapt? This approach lets the same binary run in development or production seamlessly.
Observability is non-negotiable. Without proper tracing, distributed systems become impossible to debug. Here’s how I initialize tracing:
func InitTracing(serviceName, jaegerEndpoint string) (*trace.TracerProvider, error) {
exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
jaeger.WithEndpoint(jaegerEndpoint),
)
// ... setup tracer provider
}
When a payment fails at 3 AM, this trace data becomes your best friend. How else would you find that missing link between services?
NATS JetStream handles our messaging with persistence. The setup ensures no order gets lost:
js, err := jetstream.New(nc)
orderStream, err := js.CreateStream(ctx, jetstream.StreamConfig{
Name: "ORDERS",
Subjects: []string{"ORDER.*"},
})
See how we specify subjects? This pattern allows flexible routing while keeping message flows organized.
For the order service, concurrency is key. I use worker pools for efficient processing:
for i := 0; i < cfg.WorkerPoolSize; i++ {
go func() {
for msg := range jobChan {
processOrder(msg)
}
}()
}
This pattern handles load spikes gracefully. What happens when Black Friday traffic hits? The pool scales to meet demand without crashing.
Error handling requires special attention. My retry logic looks like this:
func withRetry(fn func() error, maxAttempts int, backoff time.Duration) error {
for i := 0; i < maxAttempts; i++ {
if err := fn(); err == nil {
return nil
}
time.Sleep(backoff)
}
return errors.New("operation failed after retries")
}
Transient errors shouldn’t mean lost orders. This approach recovers from temporary blips in downstream services.
Health checks keep the system operational:
router.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
if nc.Status() != nats.CONNECTED {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
})
Simple? Yes. Critical? Absolutely. Kubernetes uses these endpoints to make lifecycle decisions.
Docker packaging ensures consistent environments:
FROM golang:1.21-alpine
WORKDIR /app
COPY go.mod ./
RUN go mod download
COPY . .
RUN go build -o /order-service ./cmd/order-service
CMD ["/order-service"]
No more “works on my machine” excuses. The same container runs everywhere.
As I wrap up, consider this: How many orders could your current system lose during a network partition? With this architecture, that number drops to zero. The combination of Go’s efficiency, JetStream’s persistence, and OpenTelemetry’s visibility creates something truly robust. If you found this useful, share it with your team or leave a comment about your experience. Let’s build more resilient systems together.