Distributed Systems Security
Trong kiαΊΏn trΓΊc microservices vΓ distributed systems, security trα» nΓͺn phα»©c tαΊ‘p hΖ‘n nhiα»u. BαΊ‘n khΓ΄ng cΓ²n trust "internal network" nα»―a β mα»i service-to-service communication Δα»u cαΊ§n authentication, authorization, vΓ encryption.
π Zero Trust mindset: "Never trust, always verify β even inside your own network."
Threat Model trong Distributed Systems
Threats ΔαΊ·c thΓΉ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Traditional Monolith β
β ββββββββββββββββββββββββββββββββββ β
β β Single Process β β
β β ββββββββ ββββββββ ββββββββ β β
β β β Auth β βLogic β β DB β β β
β β ββββββββ ββββββββ ββββββββ β β
β ββββββββββββββββββββββββββββββββββ β
β Attack surface = 1 entry point β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Microservices β
β ββββββββ ββββββββ ββββββββ β
β βSvc A βββββΊβSvc B βββββΊβSvc C β β
β ββββββββ ββββββββ ββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββ ββββββββ ββββββββ β
β β DB1 β β DB2 β β DB3 β β
β ββββββββ ββββββββ ββββββββ β
β Attack surface = N services * M endpoints β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
New threats:
- Service impersonation: Attacker pretends to be Service A
- Man-in-the-middle: Intercept traffic between services
- Data leakage: Sensitive data in service-to-service calls
- Lateral movement: Compromise one service β pivot to others
- Insider threats: Malicious service in your cluster
Mutual TLS (mTLS)
Normal TLS: Client verifies server certificate.
mTLS: Both client AND server verify each other's certificates.
How mTLS Works
Service A Service B
β β
β β ClientHello + Client Certificate β
βββββββββββββββββββββββββββββββββββββββββΊβ
β β
β β‘ Verify client cert β
β (signed by trusted CA?)β
β β
β β’ ServerHello β
β + Server Certificate β
ββββββββββββββββββββββββββββββββββββββββββ€
β β
β β£ Verify server cert β
β β
β β€ Encrypted communication (TLS) β
βββββββββββββββββββββββββββββββββββββββββΊβ
β Both parties authenticated β
Implementing mTLS in Go
Step 1: Generate Certificates
# Create CA (Certificate Authority)
openssl genrsa -out ca.key 4096
openssl req -new -x509 -key ca.key -out ca.crt -days 365 -subj "/CN=My CA"
# Create server certificate
openssl genrsa -out server.key 2048
openssl req -new -key server.key -out server.csr -subj "/CN=service-b"
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 365
# Create client certificate
openssl genrsa -out client.key 2048
openssl req -new -key client.key -out client.csr -subj "/CN=service-a"
openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out client.crt -days 365
Step 2: Server with mTLS
import (
"crypto/tls"
"crypto/x509"
"os"
)
func startMTLSServer() {
// Load server certificate
cert, err := tls.LoadX509KeyPair("server.crt", "server.key")
if err != nil {
log.Fatal(err)
}
// Load CA certificate (to verify clients)
caCert, err := os.ReadFile("ca.crt")
if err != nil {
log.Fatal(err)
}
caCertPool := x509.NewCertPool()
caCertPool.AppendCertsFromPEM(caCert)
// TLS config
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
ClientAuth: tls.RequireAndVerifyClientCert,
ClientCAs: caCertPool,
MinVersion: tls.VersionTLS13,
}
server := &http.Server{
Addr: ":8443",
TLSConfig: tlsConfig,
Handler: http.HandlerFunc(handleRequest),
}
log.Fatal(server.ListenAndServeTLS("", ""))
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
// Extract client identity from certificate
if r.TLS == nil || len(r.TLS.PeerCertificates) == 0 {
http.Error(w, "No client certificate", 401)
return
}
clientCert := r.TLS.PeerCertificates[0]
clientCN := clientCert.Subject.CommonName
log.Printf("Request from: %s", clientCN)
fmt.Fprintf(w, "Hello, %s!", clientCN)
}
Step 3: Client with mTLS
func callWithMTLS(url string) {
// Load client certificate
cert, err := tls.LoadX509KeyPair("client.crt", "client.key")
if err != nil {
log.Fatal(err)
}
// Load CA certificate (to verify server)
caCert, err := os.ReadFile("ca.crt")
if err != nil {
log.Fatal(err)
}
caCertPool := x509.NewCertPool()
caCertPool.AppendCertsFromPEM(caCert)
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
RootCAs: caCertPool,
MinVersion: tls.VersionTLS13,
}
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: tlsConfig,
},
}
resp, err := client.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
fmt.Println(string(body))
}
Certificate Rotation
type DynamicCertificateLoader struct {
certFile string
keyFile string
mu sync.RWMutex
cert *tls.Certificate
}
func NewDynamicCertificateLoader(certFile, keyFile string) (*DynamicCertificateLoader, error) {
loader := &DynamicCertificateLoader{
certFile: certFile,
keyFile: keyFile,
}
if err := loader.loadCertificate(); err != nil {
return nil, err
}
// Auto-reload every hour
go loader.autoReload(1 * time.Hour)
return loader, nil
}
func (d *DynamicCertificateLoader) loadCertificate() error {
cert, err := tls.LoadX509KeyPair(d.certFile, d.keyFile)
if err != nil {
return err
}
d.mu.Lock()
d.cert = &cert
d.mu.Unlock()
return nil
}
func (d *DynamicCertificateLoader) GetCertificate(*tls.ClientHelloInfo) (*tls.Certificate, error) {
d.mu.RLock()
defer d.mu.RUnlock()
return d.cert, nil
}
func (d *DynamicCertificateLoader) autoReload(interval time.Duration) {
ticker := time.NewTicker(interval)
for range ticker.C {
if err := d.loadCertificate(); err != nil {
log.Printf("Failed to reload certificate: %v", err)
} else {
log.Println("Certificate reloaded successfully")
}
}
}
// Usage
loader, _ := NewDynamicCertificateLoader("server.crt", "server.key")
tlsConfig := &tls.Config{
GetCertificate: loader.GetCertificate,
}
Service Mesh Security
Service mesh (Istio, Linkerd, Consul) automates mTLS, authorization, observability.
Istio Example
Enable mTLS
# Enforce mTLS for all services in namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT # Require mTLS for all traffic
Authorization Policy
# Only allow service-a to call service-b
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: service-b-authz
namespace: production
spec:
selector:
matchLabels:
app: service-b
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/production/sa/service-a"]
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/*"]
Request Authentication (JWT)
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: production
spec:
selector:
matchLabels:
app: api-gateway
jwtRules:
- issuer: "https://auth.example.com"
jwksUri: "https://auth.example.com/.well-known/jwks.json"
audiences:
- "api.example.com"
Benefits of Service Mesh
β
Automatic mTLS: No code changes, transparent encryption
β
Identity-based access control: Policy as code
β
Traffic encryption: All service-to-service encrypted
β
Observability: Distributed tracing, metrics
β
Certificate management: Auto rotation
Secrets Management
β What NOT to do
# π₯ NEVER commit secrets to git
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
database_password: "super_secret_123"
api_key: "sk_live_abc123xyz"
β Kubernetes Secrets
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
data:
database_password: c3VwZXJfc2VjcmV0XzEyMw== # base64 encoded
api_key: c2tfbGl2ZV9hYmMxMjN4eXo=
Note: Kubernetes Secrets are only base64 encoded, NOT encrypted by default!
β Encrypted Secrets (Sealed Secrets)
# Install sealed-secrets controller
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml
# Create sealed secret
echo -n "super_secret_123" | kubectl create secret generic app-secrets \
--dry-run=client --from-file=password=/dev/stdin -o yaml | \
kubeseal -o yaml > sealed-secret.yaml
# Commit sealed-secret.yaml to git (safe!)
β External Secrets (Vault, AWS Secrets Manager)
Vault:
apiVersion: v1
kind: ServiceAccount
metadata:
name: app
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
serviceAccountName: app
containers:
- name: app
image: myapp:latest
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: vault-secret
key: password
Vault Agent Injector (Sidecar pattern):
apiVersion: v1
kind: Pod
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "myapp"
vault.hashicorp.com/agent-inject-secret-db: "secret/data/database"
spec:
containers:
- name: app
image: myapp:latest
# Vault agent injects secret to /vault/secrets/db
AWS Secrets Manager (External Secrets Operator):
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secretsmanager
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
target:
name: app-secrets
data:
- secretKey: password
remoteRef:
key: prod/db/password
Service-to-Service Authentication
JWT for Service Identity
type ServiceClaims struct {
ServiceName string `json:"service_name"`
Namespace string `json:"namespace"`
jwt.RegisteredClaims
}
// Generate service token
func generateServiceToken(serviceName, namespace string, secret []byte) (string, error) {
claims := ServiceClaims{
ServiceName: serviceName,
Namespace: namespace,
RegisteredClaims: jwt.RegisteredClaims{
ExpiresAt: jwt.NewNumericDate(time.Now().Add(1 * time.Hour)),
IssuedAt: jwt.NewNumericDate(time.Now()),
Issuer: "service-mesh",
},
}
token := jwt.NewWithClaims(jwt.SigningMethodHS256, claims)
return token.SignedString(secret)
}
// Middleware to verify service token
func serviceAuthMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
authHeader := r.Header.Get("X-Service-Token")
if authHeader == "" {
http.Error(w, "Missing service token", 401)
return
}
claims := &ServiceClaims{}
token, err := jwt.ParseWithClaims(authHeader, claims, func(token *jwt.Token) (interface{}, error) {
return jwtSecret, nil
})
if err != nil || !token.Valid {
http.Error(w, "Invalid service token", 401)
return
}
// Attach service identity to context
ctx := context.WithValue(r.Context(), "service_name", claims.ServiceName)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
SPIFFE/SPIRE (Universal Service Identity)
SPIFFE = Secure Production Identity Framework For Everyone
Concepts:
- SPIFFE ID:
spiffe://trust-domain/path/to/service - SVID: SPIFFE Verifiable Identity Document (X.509 cert or JWT)
# SPIRE registration entry
apiVersion: spire.spiffe.io/v1alpha1
kind: Entry
metadata:
name: service-a
spec:
spiffeID: spiffe://example.com/namespace/production/service-a
parentID: spiffe://example.com/k8s-node
selector:
k8s:
namespace: production
pod-label: app:service-a
In code (using SPIFFE Workload API):
import "github.com/spiffe/go-spiffe/v2/workloadapi"
// Get X.509 SVID
func getSVID() (*x509svid.SVID, error) {
ctx := context.Background()
svid, err := workloadapi.FetchX509SVID(ctx)
return svid, err
}
// Create mTLS client with SPIFFE
func createSPIFFEClient() (*http.Client, error) {
ctx := context.Background()
source, err := workloadapi.NewX509Source(ctx)
if err != nil {
return nil, err
}
tlsConfig := &tls.Config{
GetClientCertificate: source.GetX509SVID,
VerifyPeerCertificate: source.VerifyPeerCertificate,
}
return &http.Client{
Transport: &http.Transport{
TLSClientConfig: tlsConfig,
},
}, nil
}
Network Segmentation
Kubernetes Network Policies
# Deny all traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# Allow service-a to call service-b
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-service-a-to-service-b
namespace: production
spec:
podSelector:
matchLabels:
app: service-b
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: service-a
ports:
- protocol: TCP
port: 8080
---
# Allow egress to database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-db-egress
namespace: production
spec:
podSelector:
matchLabels:
app: service-a
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
Supply Chain Security
Container Image Signing
Cosign (Sigstore):
# Generate key pair
cosign generate-key-pair
# Sign image
cosign sign --key cosign.key myregistry.com/myapp:v1.0.0
# Verify image
cosign verify --key cosign.pub myregistry.com/myapp:v1.0.0
Enforce signature verification (Kubernetes admission controller):
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signature
spec:
validationFailureAction: enforce
rules:
- name: verify-signature
match:
resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "myregistry.com/*"
attestors:
- entries:
- keys:
publicKeys: |
-----BEGIN PUBLIC KEY-----
...
-----END PUBLIC KEY-----
Software Bill of Materials (SBOM)
# Generate SBOM with syft
syft myregistry.com/myapp:v1.0.0 -o spdx-json > sbom.json
# Scan SBOM for vulnerabilities
grype sbom:sbom.json
Observability & Audit Logging
Distributed Tracing (OpenTelemetry)
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
)
func handleRequest(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
tracer := otel.Tracer("service-a")
// Start span
ctx, span := tracer.Start(ctx, "handleRequest")
defer span.End()
// Add attributes
span.SetAttributes(
attribute.String("user.id", getUserID(ctx)),
attribute.String("http.method", r.Method),
)
// Call downstream service with context propagation
callServiceB(ctx)
}
func callServiceB(ctx context.Context) {
req, _ := http.NewRequestWithContext(ctx, "GET", "http://service-b/api", nil)
// OpenTelemetry automatically propagates trace context in headers
client := &http.Client{}
resp, _ := client.Do(req)
defer resp.Body.Close()
}
Security Audit Logs
type AuditLog struct {
Timestamp time.Time `json:"timestamp"`
ServiceName string `json:"service_name"`
UserID string `json:"user_id"`
Action string `json:"action"`
Resource string `json:"resource"`
Result string `json:"result"`
IP string `json:"ip"`
}
func auditLog(ctx context.Context, action, resource, result string) {
log := AuditLog{
Timestamp: time.Now(),
ServiceName: getServiceName(ctx),
UserID: getUserID(ctx),
Action: action,
Resource: resource,
Result: result,
IP: getClientIP(ctx),
}
// Send to centralized logging (ELK, Splunk, etc.)
logger.Info("audit", "log", log)
}
// Usage
func deleteUserHandler(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
userID := chi.URLParam(r, "userID")
err := deleteUser(userID)
if err != nil {
auditLog(ctx, "DELETE_USER", userID, "FAILED")
http.Error(w, "Failed to delete user", 500)
return
}
auditLog(ctx, "DELETE_USER", userID, "SUCCESS")
w.WriteHeader(204)
}
Security Checklist for Distributed Systems
Service-to-Service Communication:
- Enable mTLS for all inter-service traffic
- Use service mesh (Istio/Linkerd) or implement mTLS manually
- Auto-rotate certificates
- Implement timeout & retry with exponential backoff
Authentication & Authorization:
- Verify service identity (JWT, mTLS, SPIFFE)
- Implement fine-grained authorization (not just network-level)
- Use least privilege (service A only calls what it needs)
Secrets Management:
- Never commit secrets to git
- Use Vault, AWS Secrets Manager, or Sealed Secrets
- Rotate secrets regularly
- Inject secrets at runtime, not build time
Network Security:
- Implement network policies (deny by default)
- Segment network by sensitivity (PCI, PII, etc.)
- Monitor & alert on unusual traffic patterns
Supply Chain:
- Sign container images
- Verify signatures before deployment
- Generate & scan SBOMs
- Use minimal base images (distroless, alpine)
Observability:
- Distributed tracing (OpenTelemetry)
- Audit logging for security events
- Monitor authentication failures
- Alert on anomalies
TΓ³m tαΊ―t
| Aspect | Solution |
|---|---|
| Service auth | mTLS, JWT, SPIFFE |
| Traffic encryption | mTLS, Service Mesh |
| Secrets | Vault, AWS Secrets Manager |
| Network isolation | Network Policies |
| Image security | Signing (Cosign), SBOM |
| Observability | OpenTelemetry, Audit logs |
Zero Trust principles:
- β Never trust, always verify
- β Assume breach (minimize blast radius)
- β Verify explicitly (mTLS for every call)
- β Least privilege access
- β Audit everything
BΖ°α»c tiαΊΏp theo
auth/advanced-auth.mdβ Zero Trust architecture detailscrypto-basics.mdβ How mTLS certificates workinterview-and-big-picture.mdβ Distributed systems security questions