OpenTelemetry: Production Architecture & Advanced Patterns
"OpenTelemetry is not a product. It's a standard that prevents vendor lock-in while enabling best-in-class observability."
OTel = API + SDK + Semantic Conventions + Collector. Mục tiêu: instrumentation once, export anywhere.
1. Architecture — Collector as Data Plane
1.1 Deployment Patterns
Pattern 1: Agent Mode (sidecar hoặc daemonset)
┌────────────────────────────────────────────┐
│ Kubernetes Node │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ App Pod │ │ App Pod │ │
│ │ (OTLP exp.) │ │ (OTLP exp.) │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ └──────────┬──────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ OTel Collector │ │
│ │ (DaemonSet) │ │
│ │ - Local buffering │ │
│ │ - Batch export │ │
│ └──────────┬───────────┘ │
└────────────────────┼────────────────────────┘
│ OTLP/gRPC
▼
┌──────────────────────┐
│ Central Collector │
│ (Gateway Mode) │
│ - Tail sampling │
│ - Aggregation │
│ - Multi-backend exp │
└──────────┬───────────┘
│
┌───────────┼────────────┐
▼ ▼ ▼
Prometheus Tempo Loki
Trade-off:
- Pros: Decouple apps from backend, local buffering
- Cons: Resource overhead (1 collector/node)
Pattern 2: Gateway Mode (centralized)
App Pods → OTLP → Central Collector → Backends
Trade-off:
- Pros: Fewer collectors, easier to manage
- Cons: Network hop, single point of failure
Production choice: Hybrid — Agent for buffering, Gateway for sampling/aggregation.
1.2 Collector Pipeline — The Three Stages
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
# Stage 1: Enrich
resource:
attributes:
- key: environment
value: production
action: insert
# Stage 2: Transform
transform:
trace_statements:
- context: span
statements:
# Redact sensitive attributes
- replace_pattern(attributes["http.url"], "password=.*", "password=REDACTED")
# Stage 3: Sample (tail-based)
tail_sampling:
decision_wait: 10s
num_traces: 50000
expected_new_traces_per_sec: 1000
policies:
- name: errors-policy
type: status_code
status_code: {status_codes: [ERROR]}
- name: latency-policy
type: latency
latency: {threshold_ms: 1000}
- name: rate-limiting
type: rate_limiting
rate_limiting: {spans_per_second: 100}
# Stage 4: Batch (network efficiency)
batch:
timeout: 10s
send_batch_size: 1024
exporters:
# Multi-backend export
otlp/tempo:
endpoint: tempo:4317
prometheus:
endpoint: 0.0.0.0:8889
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [resource, transform, tail_sampling, batch]
exporters: [otlp/tempo, logging]
metrics:
receivers: [otlp]
processors: [resource, batch]
exporters: [prometheus]
Key insight: Processor order matters. batch cuối cùng, tail_sampling sau transform.
2. Context Propagation — The Distributed Ledger
2.1 W3C Trace Context Deep Dive
Header anatomy:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
│└────────── trace-id (16 bytes) ────────┘└ span-id ┘│
│ │
└─ version (00) sampled flag─┘
tracestate: vendor1=value1,vendor2=value2
Sampled flag:
00: Not sampled01: Sampled
Critical: Nếu parent span không sampled, child có thể override (defer to parent).
2.2 Context in Asynchronous Flows
Problem: Background jobs, message queues lose context.
Solution 1: Inject context vào message
// Producer
ctx, span := tracer.Start(ctx, "publish_event")
defer span.End()
// Inject trace context vào Kafka header
carrier := propagation.MapCarrier{}
otel.GetTextMapPropagator().Inject(ctx, carrier)
msg := &kafka.Message{
Key: []byte("order_123"),
Value: eventData,
Headers: []kafka.Header{
{Key: "traceparent", Value: []byte(carrier["traceparent"])},
{Key: "tracestate", Value: []byte(carrier["tracestate"])},
},
}
producer.Produce(msg)
// Consumer
ctx := otel.GetTextMapPropagator().Extract(context.Background(), propagation.MapCarrier{
"traceparent": string(msg.Headers[0].Value),
"tracestate": string(msg.Headers[1].Value),
})
ctx, span := tracer.Start(ctx, "process_event")
defer span.End()
// Now span is child of producer span
Solution 2: Link (không phải parent-child)
// Khi consumer xử lý batch messages (không có parent rõ ràng)
links := []trace.Link{}
for _, msg := range messages {
extractedCtx := extractContext(msg)
links = append(links, trace.Link{
SpanContext: trace.SpanContextFromContext(extractedCtx),
})
}
ctx, span := tracer.Start(ctx, "process_batch", trace.WithLinks(links...))
// Span này không có parent, nhưng linked với nhiều producer spans
3. Instrumentation Patterns — Beyond Auto
3.1 Semantic Conventions — The Language of Telemetry
OTel định nghĩa standard attributes cho common operations.
HTTP Server (auto-instrumented):
http.method = "POST"
http.route = "/orders"
http.status_code = 200
http.target = "/orders?user_id=123"
Database (auto-instrumented):
db.system = "postgresql"
db.statement = "SELECT * FROM orders WHERE id = $1"
db.name = "production_db"
db.operation = "SELECT"
Custom Business Operations (manual):
span.SetAttributes(
attribute.String("order.id", order.ID),
attribute.String("order.type", "priority"),
attribute.Float64("order.total", order.Total),
attribute.String("payment.method", "credit_card"),
)
Critical: Dùng semantic conventions → tools tự động hiểu (e.g., Grafana auto-generate service map).
3.2 Span Events — Log within Trace
span.AddEvent("inventory_checked", trace.WithAttributes(
attribute.Bool("in_stock", true),
attribute.Int("quantity", 5),
))
span.AddEvent("payment_authorized", trace.WithAttributes(
attribute.String("transaction_id", "txn_abc123"),
))
// Events hiển thị timeline trong trace UI
Use case: Debugging — see exactly what happened at each step.
3.3 Baggage — Propagate Non-Trace Metadata
Problem: Cần propagate user_tier=premium để downstream services apply business logic, nhưng không muốn pollute span attributes.
Solution: Baggage (W3C standard)
// Service A: Set baggage
bag := baggage.FromContext(ctx)
member, _ := baggage.NewMember("user.tier", "premium")
bag, _ = bag.SetMember(member)
ctx = baggage.ContextWithBaggage(ctx, bag)
// Baggage auto-propagate via headers: baggage: user.tier=premium
// Service B: Read baggage
bag := baggage.FromContext(ctx)
tier := bag.Member("user.tier").Value()
if tier == "premium" {
// Apply premium logic
}
Warning: Baggage tăng header size → overhead. Limit < 8KB total.
4. Performance & Overhead Analysis
4.1 Instrumentation Cost Breakdown
| Component | Overhead | Mitigation |
|---|---|---|
| Span creation | ~1-2µs | Negligible |
| Attribute allocation | ~0.5µs per attr | Limit to < 20 attrs/span |
| Context propagation | ~0.1µs | Negligible |
| Sampling decision | ~10µs (tail) | Head-based for hot path |
| Export (network) | ~1ms per batch | Batch size = 1000, async |
Total overhead at 10K QPS: < 1% CPU impact nếu configured đúng.
Benchmark (Go):
func BenchmarkSpanCreation(b *testing.B) {
tracer := otel.Tracer("benchmark")
ctx := context.Background()
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, span := tracer.Start(ctx, "test_operation")
span.End()
}
}
// Result: ~800ns/op (Go 1.21, OTel SDK 1.19)
4.2 Reducing Overhead — Head Sampling at SDK
// Probability sampler: 10% sample rate
sampler := trace.ParentBased(trace.TraceIDRatioBased(0.1))
tp := trace.NewTracerProvider(
trace.WithSampler(sampler),
// ...
)
Trade-off: Bỏ 90% traces → không thấy rare errors. Use với caution.
Better: Tail sampling tại Collector (keep errors).
5. Advanced: Custom Exporters & Processors
5.1 Custom Processor — Anonymize PII
// Custom span processor to redact email addresses
type AnonymizingProcessor struct{}
func (p *AnonymizingProcessor) OnStart(ctx context.Context, s sdktrace.ReadWriteSpan) {}
func (p *AnonymizingProcessor) OnEnd(s sdktrace.ReadOnlySpan) {
for _, attr := range s.Attributes() {
if attr.Key == "user.email" {
// Redact: user@example.com → u***@e***.com
redacted := redactEmail(attr.Value.AsString())
s.SetAttributes(attribute.String("user.email", redacted))
}
}
}
func (p *AnonymizingProcessor) Shutdown(ctx context.Context) error { return nil }
func (p *AnonymizingProcessor) ForceFlush(ctx context.Context) error { return nil }
5.2 Custom Exporter — Write to Custom Backend
type CustomExporter struct {
client *http.Client
endpoint string
}
func (e *CustomExporter) ExportSpans(ctx context.Context, spans []sdktrace.ReadOnlySpan) error {
for _, span := range spans {
payload := serializeSpan(span)
req, _ := http.NewRequestWithContext(ctx, "POST", e.endpoint, bytes.NewBuffer(payload))
resp, err := e.client.Do(req)
if err != nil {
return err
}
resp.Body.Close()
}
return nil
}
func (e *CustomExporter) Shutdown(ctx context.Context) error {
return nil
}
Use case: Export đến proprietary system hoặc legacy monitoring tool.
6. Collector Scaling — High Availability
6.1 Load Balancing OTLP Exporters
# App config: Multiple collector endpoints
exporters:
otlp:
endpoint: otel-collector-lb.svc.cluster.local:4317
# DNS round-robin hoặc K8s service load balancing
6.2 Collector Sharding (Tail Sampling)
Problem: Tail sampling cần full trace → tất cả spans của 1 trace phải đến cùng 1 collector.
Solution: Consistent hashing by trace_id
# Load balancer config (e.g., Envoy)
load_assignment:
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: collector-0
port_value: 4317
- endpoint:
address:
socket_address:
address: collector-1
port_value: 4317
# Hash on trace_id header
hash_policy:
- header:
header_name: "traceparent"
regex_rewrite:
pattern:
google_re2: {}
regex: "^00-([a-f0-9]{32})-.*"
substitution: "\\1"
Result: Trace abc123 luôn đến collector-0, trace def456 luôn đến collector-1.
7. Migration Strategy — Brownfield Services
Phased Rollout
Phase 1: Passive observation (0% production impact)
// Export to dev backend only, log errors
tp := trace.NewTracerProvider(
trace.WithBatcher(devExporter, trace.WithBatchTimeout(1*time.Second)),
trace.WithSampler(trace.AlwaysSample()),
)
Phase 2: Shadow traffic (1% sample)
tp := trace.NewTracerProvider(
trace.WithBatcher(prodExporter),
trace.WithSampler(trace.TraceIDRatioBased(0.01)),
)
Phase 3: Gradual increase (1% → 10% → 100%)
Week 1: 1%
Week 2: 5%
Week 3: 10%
Week 4: 50%
Week 5: 100%
Monitoring: Watch for CPU/memory increase, export errors.
8. Troubleshooting — Common Pitfalls
Issue 1: Traces đứt (missing spans)
Root cause: Context không được propagate.
Debug:
// Check if context has trace
spanCtx := trace.SpanContextFromContext(ctx)
if !spanCtx.IsValid() {
log.Error("MISSING TRACE CONTEXT")
}
Fix: Ensure middleware propagate context, không tạo context.Background() mới.
Issue 2: Collector OOM (tail sampling)
Root cause: decision_wait quá dài × expected_new_traces_per_sec quá cao.
Memory = decision_wait × traces_per_sec × avg_trace_size
= 30s × 10K/s × 50KB
= 15GB
Fix: Giảm decision_wait hoặc tăng RAM.
Issue 3: Export rate limit
Symptom: Logs show "export failed: 429 Too Many Requests"
Fix: Tăng batch_size, giảm timeout để batch nhiều hơn.
processors:
batch:
timeout: 5s # Từ 10s → 5s
send_batch_size: 2048 # Từ 1024 → 2048
Tóm tắt: Production Checklist
- Collector deployed in agent + gateway mode
- Tail sampling configured (keep 100% errors)
- Semantic conventions used cho standard operations
- Context propagate qua HTTP/gRPC/Kafka headers
- Baggage size < 8KB
- PII redacted via processor
- Collector scaled horizontally (≥3 replicas)
- Export batch size tuned (1000-2000)
- Overhead measured (< 2% CPU impact)
- Migration rollout plan có rollback triggers