I’ve been thinking a lot about building robust microservices lately. After seeing too many systems fail under load or become impossible to debug, I wanted to share a battle-tested approach using Go, NATS JetStream, and OpenTelemetry. This combination creates systems that handle real-world chaos while providing clear visibility. Let’s explore how to build production-ready event-driven services together.
Setting up our project correctly matters. I organize services like this:
event-driven-microservices/
├── cmd/
│ ├── order-service/
│ ├── inventory-service/
│ └── notification-service/
├── internal/
│ ├── domain/
│ ├── events/
│ └── observability/
└── pkg/
└── messaging/
Our core dependencies include:
// go.mod
module github.com/yourorg/event-driven-microservices
require (
github.com/nats-io/nats.go v1.16.0
go.opentelemetry.io/otel v1.10.0
github.com/gin-gonic/gin v1.8.1
github.com/sony/gobreaker v0.5.0
)
Defining clear domain models early prevents headaches. Here’s our order structure:
type Order struct {
ID uuid.UUID
Items []OrderItem
TotalAmount float64
Status string // "pending", "confirmed"
}
func NewOrder(items []OrderItem) *Order {
total := 0.0
for _, item := range items {
total += item.Price * float64(item.Quantity)
}
return &Order{
ID: uuid.New(),
Items: items,
TotalAmount: total,
Status: "pending",
}
}
Events become our communication backbone. Notice how we embed tracing context:
type BaseEvent struct {
ID uuid.UUID
Type string // "order.created"
TraceID string // OpenTelemetry trace
}
type OrderCreatedEvent struct {
BaseEvent
Order Order
}
func PublishOrderCreated(nc *nats.Conn, order Order, traceID string) error {
event := OrderCreatedEvent{
BaseEvent: BaseEvent{
ID: uuid.New(),
Type: "order.created",
TraceID: traceID,
},
Order: order,
}
data, _ := json.Marshal(event)
return nc.Publish("order.created", data)
}
Why do distributed systems need special care for failures? Without proper patterns, small issues cascade. Let’s add resilience with a circuit breaker:
// Inventory service call with protection
cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "InventoryClient",
ReadyToTrip: func(counts gobreaker.Counts) bool {
return counts.ConsecutiveFailures > 5
},
})
response, err := cb.Execute(func() (interface{}, error) {
return inventoryClient.ReserveStock(order.Items)
})
For efficient message processing, worker pools are essential. Here’s a safe shutdown pattern:
func StartWorkers(ctx context.Context, num int, messages <-chan *nats.Msg) {
var wg sync.WaitGroup
for i := 0; i < num; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case msg := <-messages:
processMessage(msg)
case <-ctx.Done():
return
}
}
}()
}
wg.Wait() // Wait for graceful completion
}
OpenTelemetry transforms observability. Here’s how we instrument Gin:
// Order service setup
func main() {
tp := initTracer() // Jaeger/Zipkin setup
defer tp.Shutdown()
r := gin.New()
r.Use(otelgin.Middleware("order-service"))
r.POST("/orders", createOrderHandler)
}
func createOrderHandler(c *gin.Context) {
tracer := otel.Tracer("order-handler")
ctx, span := tracer.Start(c.Request.Context(), "create-order")
defer span.End()
// Business logic uses context
}
What separates production code from prototypes? Health checks and structured logging complete the picture:
// Health endpoint
r.GET("/health", func(c *gin.Context) {
c.JSON(200, gin.H{"status": "ok"})
})
// Zerolog configuration
log.Logger = zerolog.New(os.Stdout).
With().
Timestamp().
Str("service", "inventory").
Logger()
log.Info().Str("event", "stock_updated").Msg("Inventory adjusted")
Containerizing our services ensures consistency:
# Dockerfile for Go service
FROM golang:1.18-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o order-service ./cmd/order-service
FROM alpine:latest
COPY --from=builder /app/order-service /order-service
CMD ["/order-service"]
The true power emerges when these pieces interact. When an order arrives:
- Order service validates and publishes “order.created”
- Inventory service reserves stock, emits “inventory.updated”
- Notification service sends confirmation All while OpenTelemetry connects the traces across services.
Building this changed how I view distributed systems. What problems could this solve in your environment? Try implementing one pattern at a time. Share your experiences in the comments - I’d love to hear what works for you. If this helped, pass it along to others facing similar challenges!