📡 Observability✍️ Khoa📅 19/04/2026☕ 8 phút đọc

OpenTelemetry: Production Architecture & Advanced Patterns

"OpenTelemetry is not a product. It's a standard that prevents vendor lock-in while enabling best-in-class observability."

OTel = API + SDK + Semantic Conventions + Collector. Mục tiêu: instrumentation once, export anywhere.

1. Architecture — Collector as Data Plane

1.1 Deployment Patterns

Pattern 1: Agent Mode (sidecar hoặc daemonset)

┌────────────────────────────────────────────┐
│ Kubernetes Node                             │
│  ┌─────────────┐      ┌─────────────┐      │
│  │ App Pod     │      │ App Pod     │      │
│  │ (OTLP exp.) │      │ (OTLP exp.) │      │
│  └──────┬──────┘      └──────┬──────┘      │
│         └──────────┬──────────┘             │
│                    ▼                        │
│         ┌──────────────────────┐            │
│         │ OTel Collector       │            │
│         │ (DaemonSet)          │            │
│         │ - Local buffering    │            │
│         │ - Batch export       │            │
│         └──────────┬───────────┘            │
└────────────────────┼────────────────────────┘
                     │ OTLP/gRPC
                     ▼
          ┌──────────────────────┐
          │ Central Collector    │
          │ (Gateway Mode)       │
          │ - Tail sampling      │
          │ - Aggregation        │
          │ - Multi-backend exp  │
          └──────────┬───────────┘
                     │
         ┌───────────┼────────────┐
         ▼           ▼            ▼
    Prometheus   Tempo       Loki

Trade-off:

Pros: Decouple apps from backend, local buffering
Cons: Resource overhead (1 collector/node)

Pattern 2: Gateway Mode (centralized)

App Pods → OTLP → Central Collector → Backends

Trade-off:

Pros: Fewer collectors, easier to manage
Cons: Network hop, single point of failure

Production choice: Hybrid — Agent for buffering, Gateway for sampling/aggregation.

1.2 Collector Pipeline — The Three Stages

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  # Stage 1: Enrich
  resource:
    attributes:
      - key: environment
        value: production
        action: insert
  
  # Stage 2: Transform
  transform:
    trace_statements:
      - context: span
        statements:
          # Redact sensitive attributes
          - replace_pattern(attributes["http.url"], "password=.*", "password=REDACTED")
  
  # Stage 3: Sample (tail-based)
  tail_sampling:
    decision_wait: 10s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: latency-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: rate-limiting
        type: rate_limiting
        rate_limiting: {spans_per_second: 100}
  
  # Stage 4: Batch (network efficiency)
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  # Multi-backend export
  otlp/tempo:
    endpoint: tempo:4317
  prometheus:
    endpoint: 0.0.0.0:8889
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resource, transform, tail_sampling, batch]
      exporters: [otlp/tempo, logging]
    metrics:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [prometheus]

Key insight: Processor order matters. batch cuối cùng, tail_sampling sau transform.

2. Context Propagation — The Distributed Ledger

2.1 W3C Trace Context Deep Dive

Header anatomy:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             │└────────── trace-id (16 bytes) ────────┘└ span-id ┘│
             │                                                     │
             └─ version (00)                          sampled flag─┘

tracestate: vendor1=value1,vendor2=value2

Sampled flag:

00: Not sampled
01: Sampled

Critical: Nếu parent span không sampled, child có thể override (defer to parent).

2.2 Context in Asynchronous Flows

Problem: Background jobs, message queues lose context.

Solution 1: Inject context vào message

// Producer
ctx, span := tracer.Start(ctx, "publish_event")
defer span.End()

// Inject trace context vào Kafka header
carrier := propagation.MapCarrier{}
otel.GetTextMapPropagator().Inject(ctx, carrier)

msg := &kafka.Message{
    Key:   []byte("order_123"),
    Value: eventData,
    Headers: []kafka.Header{
        {Key: "traceparent", Value: []byte(carrier["traceparent"])},
        {Key: "tracestate", Value: []byte(carrier["tracestate"])},
    },
}
producer.Produce(msg)

// Consumer
ctx := otel.GetTextMapPropagator().Extract(context.Background(), propagation.MapCarrier{
    "traceparent": string(msg.Headers[0].Value),
    "tracestate":  string(msg.Headers[1].Value),
})

ctx, span := tracer.Start(ctx, "process_event")
defer span.End()
// Now span is child of producer span

Solution 2: Link (không phải parent-child)

// Khi consumer xử lý batch messages (không có parent rõ ràng)
links := []trace.Link{}
for _, msg := range messages {
    extractedCtx := extractContext(msg)
    links = append(links, trace.Link{
        SpanContext: trace.SpanContextFromContext(extractedCtx),
    })
}

ctx, span := tracer.Start(ctx, "process_batch", trace.WithLinks(links...))
// Span này không có parent, nhưng linked với nhiều producer spans

3. Instrumentation Patterns — Beyond Auto

3.1 Semantic Conventions — The Language of Telemetry

OTel định nghĩa standard attributes cho common operations.

HTTP Server (auto-instrumented):

http.method = "POST"
http.route = "/orders"
http.status_code = 200
http.target = "/orders?user_id=123"

Database (auto-instrumented):

db.system = "postgresql"
db.statement = "SELECT * FROM orders WHERE id = $1"
db.name = "production_db"
db.operation = "SELECT"

Custom Business Operations (manual):

span.SetAttributes(
    attribute.String("order.id", order.ID),
    attribute.String("order.type", "priority"),
    attribute.Float64("order.total", order.Total),
    attribute.String("payment.method", "credit_card"),
)

Critical: Dùng semantic conventions → tools tự động hiểu (e.g., Grafana auto-generate service map).

3.2 Span Events — Log within Trace

span.AddEvent("inventory_checked", trace.WithAttributes(
    attribute.Bool("in_stock", true),
    attribute.Int("quantity", 5),
))

span.AddEvent("payment_authorized", trace.WithAttributes(
    attribute.String("transaction_id", "txn_abc123"),
))

// Events hiển thị timeline trong trace UI

Use case: Debugging — see exactly what happened at each step.

3.3 Baggage — Propagate Non-Trace Metadata

Problem: Cần propagate user_tier=premium để downstream services apply business logic, nhưng không muốn pollute span attributes.

Solution: Baggage (W3C standard)

// Service A: Set baggage
bag := baggage.FromContext(ctx)
member, _ := baggage.NewMember("user.tier", "premium")
bag, _ = bag.SetMember(member)
ctx = baggage.ContextWithBaggage(ctx, bag)

// Baggage auto-propagate via headers: baggage: user.tier=premium

// Service B: Read baggage
bag := baggage.FromContext(ctx)
tier := bag.Member("user.tier").Value()
if tier == "premium" {
    // Apply premium logic
}

Warning: Baggage tăng header size → overhead. Limit < 8KB total.

4. Performance & Overhead Analysis

4.1 Instrumentation Cost Breakdown

Component	Overhead	Mitigation
Span creation	~1-2µs	Negligible
Attribute allocation	~0.5µs per attr	Limit to < 20 attrs/span
Context propagation	~0.1µs	Negligible
Sampling decision	~10µs (tail)	Head-based for hot path
Export (network)	~1ms per batch	Batch size = 1000, async

Total overhead at 10K QPS: < 1% CPU impact nếu configured đúng.

Benchmark (Go):

func BenchmarkSpanCreation(b *testing.B) {
    tracer := otel.Tracer("benchmark")
    ctx := context.Background()
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, span := tracer.Start(ctx, "test_operation")
        span.End()
    }
}
// Result: ~800ns/op (Go 1.21, OTel SDK 1.19)

4.2 Reducing Overhead — Head Sampling at SDK

// Probability sampler: 10% sample rate
sampler := trace.ParentBased(trace.TraceIDRatioBased(0.1))

tp := trace.NewTracerProvider(
    trace.WithSampler(sampler),
    // ...
)

Trade-off: Bỏ 90% traces → không thấy rare errors. Use với caution.

Better: Tail sampling tại Collector (keep errors).

5. Advanced: Custom Exporters & Processors

5.1 Custom Processor — Anonymize PII

// Custom span processor to redact email addresses
type AnonymizingProcessor struct{}

func (p *AnonymizingProcessor) OnStart(ctx context.Context, s sdktrace.ReadWriteSpan) {}

func (p *AnonymizingProcessor) OnEnd(s sdktrace.ReadOnlySpan) {
    for _, attr := range s.Attributes() {
        if attr.Key == "user.email" {
            // Redact: user@example.com → u***@e***.com
            redacted := redactEmail(attr.Value.AsString())
            s.SetAttributes(attribute.String("user.email", redacted))
        }
    }
}

func (p *AnonymizingProcessor) Shutdown(ctx context.Context) error { return nil }
func (p *AnonymizingProcessor) ForceFlush(ctx context.Context) error { return nil }

5.2 Custom Exporter — Write to Custom Backend

type CustomExporter struct {
    client *http.Client
    endpoint string
}

func (e *CustomExporter) ExportSpans(ctx context.Context, spans []sdktrace.ReadOnlySpan) error {
    for _, span := range spans {
        payload := serializeSpan(span)
        req, _ := http.NewRequestWithContext(ctx, "POST", e.endpoint, bytes.NewBuffer(payload))
        resp, err := e.client.Do(req)
        if err != nil {
            return err
        }
        resp.Body.Close()
    }
    return nil
}

func (e *CustomExporter) Shutdown(ctx context.Context) error {
    return nil
}

Use case: Export đến proprietary system hoặc legacy monitoring tool.

6. Collector Scaling — High Availability

6.1 Load Balancing OTLP Exporters

# App config: Multiple collector endpoints
exporters:
  otlp:
    endpoint: otel-collector-lb.svc.cluster.local:4317
    # DNS round-robin hoặc K8s service load balancing

6.2 Collector Sharding (Tail Sampling)

Problem: Tail sampling cần full trace → tất cả spans của 1 trace phải đến cùng 1 collector.

Solution: Consistent hashing by trace_id

# Load balancer config (e.g., Envoy)
load_assignment:
  endpoints:
    - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: collector-0
                port_value: 4317
        - endpoint:
            address:
              socket_address:
                address: collector-1
                port_value: 4317
  # Hash on trace_id header
  hash_policy:
    - header:
        header_name: "traceparent"
        regex_rewrite:
          pattern:
            google_re2: {}
            regex: "^00-([a-f0-9]{32})-.*"
          substitution: "\\1"

Result: Trace abc123 luôn đến collector-0, trace def456 luôn đến collector-1.

7. Migration Strategy — Brownfield Services

Phased Rollout

Phase 1: Passive observation (0% production impact)

// Export to dev backend only, log errors
tp := trace.NewTracerProvider(
    trace.WithBatcher(devExporter, trace.WithBatchTimeout(1*time.Second)),
    trace.WithSampler(trace.AlwaysSample()),
)

Phase 2: Shadow traffic (1% sample)

tp := trace.NewTracerProvider(
    trace.WithBatcher(prodExporter),
    trace.WithSampler(trace.TraceIDRatioBased(0.01)),
)

Phase 3: Gradual increase (1% → 10% → 100%)

Week 1: 1%
Week 2: 5%
Week 3: 10%
Week 4: 50%
Week 5: 100%

Monitoring: Watch for CPU/memory increase, export errors.

8. Troubleshooting — Common Pitfalls

Issue 1: Traces đứt (missing spans)

Root cause: Context không được propagate.

Debug:

// Check if context has trace
spanCtx := trace.SpanContextFromContext(ctx)
if !spanCtx.IsValid() {
    log.Error("MISSING TRACE CONTEXT")
}

Fix: Ensure middleware propagate context, không tạo context.Background() mới.

Issue 2: Collector OOM (tail sampling)

Root cause: decision_wait quá dài × expected_new_traces_per_sec quá cao.

Memory = decision_wait × traces_per_sec × avg_trace_size
        = 30s × 10K/s × 50KB
        = 15GB

Fix: Giảm decision_wait hoặc tăng RAM.

Issue 3: Export rate limit

Symptom: Logs show "export failed: 429 Too Many Requests"

Fix: Tăng batch_size, giảm timeout để batch nhiều hơn.

processors:
  batch:
    timeout: 5s          # Từ 10s → 5s
    send_batch_size: 2048 # Từ 1024 → 2048

Tóm tắt: Production Checklist

Collector deployed in agent + gateway mode
Tail sampling configured (keep 100% errors)
Semantic conventions used cho standard operations
Context propagate qua HTTP/gRPC/Kafka headers
Baggage size < 8KB
PII redacted via processor
Collector scaled horizontally (≥3 replicas)
Export batch size tuned (1000-2000)
Overhead measured (< 2% CPU impact)
Migration rollout plan có rollback triggers