🏭 Domains✍️ Khoa📅 19/04/2026☕ 42 phút đọc

Domain: Gaming Backend

Một game multiplayer có thể có 10 triệu concurrent players, matchmaking trong <3s, game server tick rate 64Hz, và latency dưới 100ms. Đằng sau smooth experience đó là cả một hệ thống backend phức tạp: authoritative servers chống hack, real-time state sync với client prediction, matchmaking xử lý millions requests/second, và leaderboard global update theo thời gian thực.

Section này mô tả cách thiết kế gaming backend ở scale — từ MOBA như DOTA 2, battle royale như PUBG, cho đến competitive shooters như Valorant.

1. Gaming System Architecture — Tổng quan

1.1 Core Components

┌─────────────────────────────────────────────────────────────────────┐
│                         Client (Game)                               │
│                                                                     │
│  Game Engine (Unity/Unreal) ──► Input ──► Network Layer            │
│                                              │                      │
└──────────────────────────────────────────────┼──────────────────────┘
                                               │
                         WebSocket/UDP         │
                                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      API Gateway / Load Balancer                    │
│                    (Session routing, TLS termination)               │
└──────────────────┬──────────────────┬───────────────┬───────────────┘
                   │                  │               │
        ┌──────────▼─────────┐  ┌─────▼─────────┐   │
        │ Matchmaking        │  │ Lobby Service  │   │
        │ Service            │  │                │   │
        │ (Find opponents)   │  │ (Party, chat)  │   │
        └────────┬───────────┘  └────────────────┘   │
                 │                                    │
                 ▼                                    ▼
        ┌────────────────────┐           ┌───────────────────────┐
        │ Game Server        │           │ Auth / Profile        │
        │ (Authoritative     │           │ Service               │
        │  state, tick)      │◄──────────┤ (User data, inv)      │
        └────────┬───────────┘           └───────────────────────┘
                 │
                 │ Game events
                 ▼
        ┌────────────────────┐
        │ Analytics Pipeline │
        │ (Kafka, Flink)     │
        └────────────────────┘
                 │
                 ▼
        ┌────────────────────┐           ┌───────────────────────┐
        │ Leaderboard        │           │ Anti-cheat            │
        │ (Redis sorted set) │           │ (Heuristics, ML)      │
        └────────────────────┘           └───────────────────────┘

1.2 Critical Requirements

Latency:
  - Matchmaking: <3 seconds
  - Game state sync: 16-64ms (30-60Hz tick rate)
  - Leaderboard update: <100ms
  - Analytics: near real-time (1-5s)

Scale:
  - 10M+ concurrent players (peak)
  - 100K+ game servers running
  - 1M+ matches/hour
  - Billions events/day

Availability:
  - 99.95%+ uptime (downtime = lost revenue)
  - Regional failover (players can't wait)
  - Graceful degradation (match quality > no match)

2. Matchmaking System — ELO & Skill-based

2.1 ELO/MMR Rating System

MMR (Matchmaking Rating) = Hidden skill number
  - Win → MMR tăng
  - Lose → MMR giảm
  - Delta phụ thuộc opponent's MMR

Công thức ELO cơ bản:

  R' = R + K × (S - E)

  R  = Current rating
  R' = New rating
  K  = K-factor (sensitivity, 16-32)
  S  = Actual score (1 = win, 0 = lose, 0.5 = draw)
  E  = Expected score = 1 / (1 + 10^((R_opponent - R) / 400))

Ví dụ:
  Alice (MMR=1600) vs Bob (MMR=1800):
  
  E_alice = 1 / (1 + 10^((1800-1600)/400)) = 1 / (1 + 10^0.5) ≈ 0.24
  E_bob   = 1 - E_alice ≈ 0.76
  
  Nếu Alice thắng (surprise!):
    R'_alice = 1600 + 32 × (1 - 0.24) = 1624.3
    R'_bob   = 1800 + 32 × (0 - 0.76) = 1775.7
  
  Nếu Bob thắng (expected):
    R'_bob = 1800 + 32 × (1 - 0.76) = 1807.7 (chỉ tăng 7.7)

2.2 Schema Design

-- Players
CREATE TABLE players (
    id                  BIGSERIAL PRIMARY KEY,
    user_id             BIGINT UNIQUE NOT NULL,
    username            VARCHAR(100) NOT NULL,
    mmr                 INT NOT NULL DEFAULT 1500,      -- Hidden skill rating
    display_rank        VARCHAR(50),                    -- "Gold II", "Diamond IV"
    wins                INT DEFAULT 0,
    losses              INT DEFAULT 0,
    win_rate            NUMERIC(5,2) DEFAULT 50.0,
    peak_mmr            INT DEFAULT 1500,
    region              VARCHAR(10) NOT NULL,           -- "us-west", "eu", "asia"
    last_match_at       TIMESTAMPTZ,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_players_mmr ON players(mmr DESC, region);
CREATE INDEX idx_players_region_mmr ON players(region, mmr DESC);
CREATE INDEX idx_players_last_match ON players(last_match_at DESC);

-- Match history
CREATE TABLE matches (
    id                  BIGSERIAL PRIMARY KEY,
    match_id            VARCHAR(50) UNIQUE NOT NULL,
    mode                VARCHAR(50) NOT NULL,           -- "ranked", "casual", "tournament"
    region              VARCHAR(10) NOT NULL,
    avg_mmr             INT NOT NULL,                   -- Average team MMR
    duration_seconds    INT,
    winner_team         INT,                            -- 1 or 2
    created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
    ended_at            TIMESTAMPTZ
);

CREATE INDEX idx_matches_created ON matches(created_at DESC);
CREATE INDEX idx_matches_mode_region ON matches(mode, region, created_at DESC);

-- Match participants
CREATE TABLE match_participants (
    id                  BIGSERIAL PRIMARY KEY,
    match_id            VARCHAR(50) NOT NULL REFERENCES matches(match_id),
    player_id           BIGINT NOT NULL REFERENCES players(id),
    team                INT NOT NULL,                   -- 1 or 2
    mmr_before          INT NOT NULL,
    mmr_after           INT NOT NULL,
    mmr_delta           INT NOT NULL,
    kills               INT DEFAULT 0,
    deaths              INT DEFAULT 0,
    assists             INT DEFAULT 0,
    damage_dealt        BIGINT DEFAULT 0,
    is_mvp              BOOLEAN DEFAULT false,
    UNIQUE(match_id, player_id)
);

CREATE INDEX idx_participants_player ON match_participants(player_id, id DESC);
CREATE INDEX idx_participants_match ON match_participants(match_id);

2.3 Matchmaking Algorithm — Skill + Latency

package matchmaking

import (
    "context"
    "fmt"
    "math"
    "time"
)

// MatchmakingRequest from a player
type MatchmakingRequest struct {
    PlayerID    int64
    MMR         int
    Region      string
    Latencies   map[string]int // region -> latency in ms
    QueuedAt    time.Time
}

// MatchmakingPool — in-memory queue
type MatchmakingPool struct {
    queue []*MatchmakingRequest
}

// Matchmaking parameters
const (
    TeamSize           = 5
    MaxMMRDelta        = 100  // Initial tolerance
    MaxMMRDeltaMax     = 500  // Max tolerance after wait
    MMRDeltaIncrement  = 50   // Per 10 seconds
    MaxLatency         = 80   // ms
    MaxWaitTime        = 120  // seconds
)

// FindMatch attempts to create a balanced match (5v5)
func (p *MatchmakingPool) FindMatch(ctx context.Context) (*Match, error) {
    if len(p.queue) < TeamSize*2 {
        return nil, fmt.Errorf("not enough players")
    }
    
    // Sort queue by MMR for faster matching
    sortByMMR(p.queue)
    
    for i := 0; i < len(p.queue)-TeamSize*2+1; i++ {
        anchor := p.queue[i]
        
        // Calculate max allowed MMR delta based on wait time
        waitSeconds := time.Since(anchor.QueuedAt).Seconds()
        maxDelta := calculateMaxMMRDelta(waitSeconds)
        
        // Try to form two teams
        team1, team2, err := p.formTeams(anchor, maxDelta)
        if err != nil {
            continue
        }
        
        // Validate latency constraints
        if !validateLatency(team1, team2) {
            continue
        }
        
        // Found a valid match!
        return &Match{
            MatchID: generateMatchID(),
            Team1:   team1,
            Team2:   team2,
            AvgMMR:  calculateAvgMMR(team1, team2),
            Region:  selectBestRegion(team1, team2),
        }, nil
    }
    
    return nil, fmt.Errorf("no valid match found")
}

func calculateMaxMMRDelta(waitSeconds float64) int {
    // Expand tolerance over time (graceful degradation)
    delta := MaxMMRDelta + int(waitSeconds/10.0)*MMRDeltaIncrement
    if delta > MaxMMRDeltaMax {
        return MaxMMRDeltaMax
    }
    return delta
}

func (p *MatchmakingPool) formTeams(anchor *MatchmakingRequest, maxDelta int) (
    []*MatchmakingRequest, []*MatchmakingRequest, error) {
    
    candidates := []*MatchmakingRequest{anchor}
    
    // Find players within MMR range
    for _, req := range p.queue {
        if req.PlayerID == anchor.PlayerID {
            continue
        }
        if math.Abs(float64(req.MMR-anchor.MMR)) <= float64(maxDelta) {
            candidates = append(candidates, req)
            if len(candidates) == TeamSize*2 {
                break
            }
        }
    }
    
    if len(candidates) < TeamSize*2 {
        return nil, nil, fmt.Errorf("not enough candidates")
    }
    
    // Greedy team balancing: alternating assignment by MMR
    sortByMMR(candidates)
    team1, team2 := []*MatchmakingRequest{}, []*MatchmakingRequest{}
    
    for i, c := range candidates {
        if i%2 == 0 {
            team1 = append(team1, c)
        } else {
            team2 = append(team2, c)
        }
    }
    
    // Check team balance (MMR difference should be small)
    mmr1 := avgMMR(team1)
    mmr2 := avgMMR(team2)
    if math.Abs(float64(mmr1-mmr2)) > float64(maxDelta) {
        return nil, nil, fmt.Errorf("teams unbalanced")
    }
    
    return team1, team2, nil
}

func validateLatency(team1, team2 []*MatchmakingRequest) bool {
    // Find region với lowest max latency
    allPlayers := append(team1, team2...)
    
    for region := range allPlayers[0].Latencies {
        maxLatency := 0
        for _, p := range allPlayers {
            if lat, ok := p.Latencies[region]; ok {
                if lat > maxLatency {
                    maxLatency = lat
                }
            } else {
                // Player không thể connect region này
                maxLatency = 9999
                break
            }
        }
        
        if maxLatency <= MaxLatency {
            return true
        }
    }
    
    return false
}

func selectBestRegion(team1, team2 []*MatchmakingRequest) string {
    allPlayers := append(team1, team2...)
    regionScores := make(map[string]int)
    
    // Score = sum of latencies (lower is better)
    for region := range allPlayers[0].Latencies {
        score := 0
        for _, p := range allPlayers {
            if lat, ok := p.Latencies[region]; ok {
                score += lat
            } else {
                score = 999999
                break
            }
        }
        regionScores[region] = score
    }
    
    // Return region with lowest total latency
    bestRegion := ""
    bestScore := 999999
    for region, score := range regionScores {
        if score < bestScore {
            bestScore = score
            bestRegion = region
        }
    }
    
    return bestRegion
}

2.4 Matchmaking Service Architecture

┌──────────────────────────────────────────────────────────────────┐
│                    Client (Game Client)                          │
└────────────────────────────┬─────────────────────────────────────┘
                             │ Queue Request
                             ▼
┌──────────────────────────────────────────────────────────────────┐
│                      API Gateway (gRPC/HTTP)                      │
└────────────────────────────┬─────────────────────────────────────┘
                             │
                             ▼
┌──────────────────────────────────────────────────────────────────┐
│              Matchmaking Service (Stateful)                       │
│                                                                   │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐    │
│  │ MM Worker 1    │  │ MM Worker 2    │  │ MM Worker N    │    │
│  │ (Region: US)   │  │ (Region: EU)   │  │ (Region: Asia) │    │
│  │                │  │                │  │                │    │
│  │ In-memory pool │  │ In-memory pool │  │ In-memory pool │    │
│  │ + matching     │  │ + matching     │  │ + matching     │    │
│  └────────┬───────┘  └────────┬───────┘  └────────┬───────┘    │
│           │                   │                   │              │
└───────────┼───────────────────┼───────────────────┼──────────────┘
            │                   │                   │
            └───────────────────┴───────────────────┘
                             │
                Match found  │
                             ▼
                   ┌──────────────────┐
                   │ Game Server Pool │
                   │ (Spawn instance) │
                   └──────────────────┘

Sharding strategy: Shard matchmaking workers by region.

US players → US worker
EU players → EU worker
Cross-region matching chỉ khi không đủ players

Scalability: Stateless API Gateway + stateful MM workers (sticky session không cần)

3. Real-time Multiplayer — Authoritative Server

3.1 Client-Server Models

Model 1: Peer-to-peer (P2P)
  ┌─────────┐
  │ Client A│◄─────────────────────┐
  └────┬────┘                      │
       │                           │
       │ Direct connection         │
       ▼                           │
  ┌─────────┐                 ┌─────────┐
  │ Client B│────────────────►│ Client C│
  └─────────┘                 └─────────┘

  Pros: Low latency (no server hop), low cost
  Cons: Vulnerable to cheating (client can lie), NAT traversal issues

Model 2: Client-Server (Authoritative)
  ┌─────────┐      ┌─────────┐      ┌─────────┐
  │ Client A│◄─────┤         │─────►│ Client B│
  └─────────┘      │  Game   │      └─────────┘
                   │ Server  │
  ┌─────────┐      │(Truth)  │      ┌─────────┐
  │ Client C│◄─────┤         │─────►│ Client D│
  └─────────┘      └─────────┘      └─────────┘

  Pros: Server validates all actions (anti-cheat), consistent state
  Cons: Higher latency (client → server → client), server cost

Modern competitive games: Authoritative server (anti-cheat > latency)

3.2 Game Loop — Tick-based Simulation

package gameserver

import (
    "time"
)

const (
    TickRate      = 64                    // 64 ticks/second (15.625 ms/tick)
    TickDuration  = time.Second / TickRate
)

type GameServer struct {
    matchID       string
    players       map[int64]*Player
    gameState     *GameState
    tickNumber    int64
    running       bool
}

func (gs *GameServer) Run() {
    ticker := time.NewTicker(TickDuration)
    defer ticker.Stop()
    
    gs.running = true
    
    for gs.running {
        select {
        case <-ticker.C:
            gs.Tick()
        }
    }
}

func (gs *GameServer) Tick() {
    gs.tickNumber++
    
    // 1. Process player inputs (buffered from network)
    for _, player := range gs.players {
        gs.processPlayerInput(player)
    }
    
    // 2. Update game simulation
    gs.updatePhysics()
    gs.updateEntities()
    gs.detectCollisions()
    gs.updateGameLogic()
    
    // 3. Broadcast state snapshot to clients
    snapshot := gs.createSnapshot()
    gs.broadcastSnapshot(snapshot)
    
    // 4. Cleanup
    gs.removeDeadEntities()
    
    // Log tick performance
    if gs.tickNumber%64 == 0 {
        // Every second
        gs.logTickMetrics()
    }
}

func (gs *GameServer) processPlayerInput(player *Player) {
    // Dequeue input commands from buffer
    for player.inputBuffer.Len() > 0 {
        input := player.inputBuffer.Dequeue()
        
        // Validate input (anti-cheat)
        if !gs.validateInput(player, input) {
            // Suspicious input, log and ignore
            gs.logSuspiciousInput(player, input)
            continue
        }
        
        // Apply input to player entity
        gs.applyInput(player, input)
    }
}

func (gs *GameServer) updatePhysics() {
    deltaTime := float64(TickDuration) / float64(time.Second)
    
    for _, entity := range gs.gameState.Entities {
        if !entity.IsStatic {
            // Update position based on velocity
            entity.Position.X += entity.Velocity.X * deltaTime
            entity.Position.Y += entity.Velocity.Y * deltaTime
            entity.Position.Z += entity.Velocity.Z * deltaTime
            
            // Apply gravity
            entity.Velocity.Z -= 9.8 * deltaTime
            
            // Apply friction
            entity.Velocity.X *= 0.99
            entity.Velocity.Y *= 0.99
        }
    }
}

func (gs *GameServer) validateInput(player *Player, input *PlayerInput) bool {
    // Speed hack detection
    distance := input.Position.Distance(player.LastPosition)
    timeDelta := input.Timestamp - player.LastInputTime
    
    if timeDelta > 0 {
        speed := distance / timeDelta.Seconds()
        if speed > player.MaxSpeed*1.2 { // 20% tolerance
            return false
        }
    }
    
    // Teleport detection
    if distance > player.MaxSpeed*0.1 { // Max distance in 1 tick
        return false
    }
    
    return true
}

type Snapshot struct {
    TickNumber    int64
    Timestamp     time.Time
    Entities      []*EntityState
}

func (gs *GameServer) createSnapshot() *Snapshot {
    entities := make([]*EntityState, 0, len(gs.gameState.Entities))
    
    for _, entity := range gs.gameState.Entities {
        entities = append(entities, &EntityState{
            ID:       entity.ID,
            Position: entity.Position,
            Rotation: entity.Rotation,
            Velocity: entity.Velocity,
            Health:   entity.Health,
        })
    }
    
    return &Snapshot{
        TickNumber: gs.tickNumber,
        Timestamp:  time.Now(),
        Entities:   entities,
    }
}

func (gs *GameServer) broadcastSnapshot(snapshot *Snapshot) {
    // Delta compression: chỉ gửi entities thay đổi
    for playerID, player := range gs.players {
        delta := gs.computeDelta(player.LastSnapshot, snapshot)
        gs.sendToPlayer(playerID, delta)
        player.LastSnapshot = snapshot
    }
}

3.3 Client Prediction & Server Reconciliation

// Client-side prediction (JavaScript/Unity C#)
class GameClient {
    constructor() {
        this.localPlayer = null;
        this.lastServerSnapshot = null;
        this.pendingInputs = [];  // Not yet acknowledged by server
        this.localTickNumber = 0;
    }
    
    // Client tick (may run faster than server)
    tick() {
        this.localTickNumber++;
        
        // 1. Capture player input
        const input = this.captureInput();
        input.tickNumber = this.localTickNumber;
        
        // 2. Apply input locally (prediction)
        this.applyInput(this.localPlayer, input);
        
        // 3. Send input to server
        this.sendInputToServer(input);
        
        // 4. Store input for reconciliation
        this.pendingInputs.push(input);
        
        // 5. Interpolate other entities
        this.interpolateEntities();
        
        // 6. Render
        this.render();
    }
    
    // Receive server snapshot
    onServerSnapshot(snapshot) {
        // 1. Update last known server state
        this.lastServerSnapshot = snapshot;
        
        // 2. Find player entity in snapshot
        const serverPlayer = snapshot.entities.find(e => e.id === this.localPlayer.id);
        
        // 3. Server reconciliation
        this.reconcileWithServer(serverPlayer, snapshot.tickNumber);
        
        // 4. Update other entities
        for (const entity of snapshot.entities) {
            if (entity.id !== this.localPlayer.id) {
                this.updateEntity(entity);
            }
        }
    }
    
    reconcileWithServer(serverPlayer, serverTickNumber) {
        // Server position might differ from client prediction (latency, packet loss)
        
        // 1. Remove acknowledged inputs
        this.pendingInputs = this.pendingInputs.filter(
            input => input.tickNumber > serverTickNumber
        );
        
        // 2. Rewind to server state
        this.localPlayer.position = serverPlayer.position.clone();
        this.localPlayer.velocity = serverPlayer.velocity.clone();
        
        // 3. Replay pending inputs (not yet acknowledged)
        for (const input of this.pendingInputs) {
            this.applyInput(this.localPlayer, input);
        }
        
        // If position error is too large, snap to server (teleport detection)
        const error = this.localPlayer.position.distanceTo(serverPlayer.position);
        if (error > 5.0) {  // Threshold
            console.warn("Large prediction error, snapping to server");
            this.localPlayer.position = serverPlayer.position.clone();
        }
    }
    
    // Entity interpolation (smooth movement of other players)
    interpolateEntities() {
        const now = Date.now();
        const renderTime = now - 100;  // Render 100ms in the past
        
        for (const entity of this.entities) {
            if (entity.id === this.localPlayer.id) continue;
            
            // Find two snapshots to interpolate between
            const [before, after] = this.findSnapshotsForTime(entity, renderTime);
            
            if (before && after) {
                const t = (renderTime - before.timestamp) / (after.timestamp - before.timestamp);
                entity.position = Vector3.lerp(before.position, after.position, t);
                entity.rotation = Quaternion.slerp(before.rotation, after.rotation, t);
            }
        }
    }
}

3.4 Lag Compensation — Rewind Time

// Server-side lag compensation for hit detection
type HistoricalState struct {
    Timestamp time.Time
    Entities  map[int64]*EntityState
}

type GameServer struct {
    // ... other fields
    stateHistory []*HistoricalState  // Ring buffer
    historySize  int
}

func (gs *GameServer) HandlePlayerShoot(player *Player, shootEvent *ShootEvent) {
    // Client gửi: "I shot at tick X, aiming at position Y"
    
    // 1. Estimate client's view time (account for latency)
    clientViewTime := shootEvent.Timestamp.Add(-player.Ping / 2)
    
    // 2. Rewind game state to that time
    historicalState := gs.getHistoricalState(clientViewTime)
    
    // 3. Perform hit detection in historical state
    target := gs.raycast(shootEvent.Origin, shootEvent.Direction, historicalState)
    
    if target != nil {
        // Hit confirmed!
        gs.applyDamage(target, shootEvent.Damage)
        
        // Broadcast hit event to all clients
        gs.broadcastHitEvent(&HitEvent{
            Shooter:   player.ID,
            Target:    target.ID,
            Damage:    shootEvent.Damage,
            Timestamp: time.Now(),
        })
    }
}

func (gs *GameServer) getHistoricalState(t time.Time) *HistoricalState {
    // Binary search trong ring buffer
    for i := len(gs.stateHistory) - 1; i >= 0; i-- {
        state := gs.stateHistory[i]
        if state.Timestamp.Before(t) || state.Timestamp.Equal(t) {
            return state
        }
    }
    return gs.stateHistory[0] // Fallback to oldest
}

func (gs *GameServer) Tick() {
    // ... existing tick logic
    
    // Save current state to history
    snapshot := gs.createSnapshot()
    gs.saveToHistory(snapshot)
}

func (gs *GameServer) saveToHistory(snapshot *Snapshot) {
    state := &HistoricalState{
        Timestamp: snapshot.Timestamp,
        Entities:  make(map[int64]*EntityState),
    }
    
    for _, entity := range snapshot.Entities {
        state.Entities[entity.ID] = entity.Clone()
    }
    
    // Ring buffer
    if len(gs.stateHistory) >= gs.historySize {
        gs.stateHistory = gs.stateHistory[1:]
    }
    gs.stateHistory = append(gs.stateHistory, state)
}

4. Game State Synchronization — Delta Compression

4.1 Snapshot vs Delta

Full Snapshot: Gửi toàn bộ game state mỗi tick
  Pros: Simple, reliable
  Cons: Bandwidth-heavy (10KB+ per snapshot × 64Hz = 640KB/s per player)

Delta Compression: Chỉ gửi thay đổi so với snapshot trước
  Pros: Bandwidth-efficient (100-500 bytes per tick)
  Cons: More complex, requires reliable delivery of baseline

4.2 Delta Encoding Implementation

# Python example (actual game sử dụng C++/Go)
import struct
from dataclasses import dataclass
from typing import Dict, List, Optional

@dataclass
class EntityState:
    id: int
    position: tuple  # (x, y, z)
    rotation: float
    health: int
    velocity: tuple  # (vx, vy, vz)

class DeltaEncoder:
    def __init__(self):
        self.baseline_snapshot: Optional[Dict[int, EntityState]] = None
    
    def encode_snapshot(self, current_snapshot: Dict[int, EntityState]) -> bytes:
        """
        Encode snapshot as delta from baseline.
        Returns binary delta packet.
        """
        if self.baseline_snapshot is None:
            # First snapshot — send full
            return self._encode_full_snapshot(current_snapshot)
        
        delta_data = bytearray()
        
        # Header: packet type (1 = delta)
        delta_data.append(1)
        
        # Changed/new entities
        changed_entities = []
        for entity_id, current_state in current_snapshot.items():
            if entity_id not in self.baseline_snapshot:
                # New entity
                changed_entities.append((entity_id, current_state, 'new'))
            else:
                baseline_state = self.baseline_snapshot[entity_id]
                if self._state_changed(baseline_state, current_state):
                    changed_entities.append((entity_id, current_state, 'changed'))
        
        # Deleted entities
        deleted_entities = [
            entity_id for entity_id in self.baseline_snapshot
            if entity_id not in current_snapshot
        ]
        
        # Write counts
        delta_data.extend(struct.pack('!HH', len(changed_entities), len(deleted_entities)))
        
        # Write changed entities
        for entity_id, state, change_type in changed_entities:
            if change_type == 'new':
                delta_data.extend(self._encode_entity_full(entity_id, state))
            else:
                delta_data.extend(self._encode_entity_delta(
                    entity_id, self.baseline_snapshot[entity_id], state
                ))
        
        # Write deleted entities
        for entity_id in deleted_entities:
            delta_data.extend(struct.pack('!I', entity_id))
        
        return bytes(delta_data)
    
    def _state_changed(self, baseline: EntityState, current: EntityState) -> bool:
        """
        Check if entity state has changed significantly.
        Use epsilon for floating point comparison.
        """
        epsilon = 0.01
        
        # Position changed?
        for i in range(3):
            if abs(baseline.position[i] - current.position[i]) > epsilon:
                return True
        
        # Rotation changed?
        if abs(baseline.rotation - current.rotation) > epsilon:
            return True
        
        # Health changed?
        if baseline.health != current.health:
            return True
        
        # Velocity changed?
        for i in range(3):
            if abs(baseline.velocity[i] - current.velocity[i]) > epsilon:
                return True
        
        return False
    
    def _encode_entity_delta(self, entity_id: int, baseline: EntityState, current: EntityState) -> bytes:
        """
        Encode only changed fields with bit flags.
        """
        data = bytearray()
        
        # Entity ID
        data.extend(struct.pack('!I', entity_id))
        
        # Bit flags for changed fields
        flags = 0
        flag_data = bytearray()
        
        # Position (bit 0)
        if any(abs(baseline.position[i] - current.position[i]) > 0.01 for i in range(3)):
            flags |= (1 << 0)
            # Compress position: use int16 with 0.01 precision
            for coord in current.position:
                flag_data.extend(struct.pack('!h', int(coord * 100)))
        
        # Rotation (bit 1)
        if abs(baseline.rotation - current.rotation) > 0.01:
            flags |= (1 << 1)
            flag_data.extend(struct.pack('!h', int(current.rotation * 100)))
        
        # Health (bit 2)
        if baseline.health != current.health:
            flags |= (1 << 2)
            flag_data.append(current.health & 0xFF)
        
        # Velocity (bit 3)
        if any(abs(baseline.velocity[i] - current.velocity[i]) > 0.01 for i in range(3)):
            flags |= (1 << 3)
            for v in current.velocity:
                flag_data.extend(struct.pack('!h', int(v * 100)))
        
        # Write flags and data
        data.append(flags & 0xFF)
        data.extend(flag_data)
        
        return bytes(data)
    
    def _encode_full_snapshot(self, snapshot: Dict[int, EntityState]) -> bytes:
        """Full snapshot encoding for baseline."""
        data = bytearray()
        data.append(0)  # packet type: full snapshot
        data.extend(struct.pack('!H', len(snapshot)))
        
        for entity_id, state in snapshot.items():
            data.extend(self._encode_entity_full(entity_id, state))
        
        return bytes(data)
    
    def _encode_entity_full(self, entity_id: int, state: EntityState) -> bytes:
        data = bytearray()
        data.extend(struct.pack('!I', entity_id))
        
        # Position (3 floats)
        for coord in state.position:
            data.extend(struct.pack('!f', coord))
        
        # Rotation
        data.extend(struct.pack('!f', state.rotation))
        
        # Health
        data.append(state.health & 0xFF)
        
        # Velocity
        for v in state.velocity:
            data.extend(struct.pack('!f', v))
        
        return bytes(data)

# Usage
encoder = DeltaEncoder()

# Tick 1
snapshot1 = {
    1: EntityState(1, (10.0, 5.0, 0.0), 90.0, 100, (1.0, 0.0, 0.0)),
    2: EntityState(2, (15.0, 8.0, 0.0), 180.0, 80, (0.0, 1.0, 0.0)),
}
packet1 = encoder.encode_snapshot(snapshot1)  # Full snapshot
encoder.baseline_snapshot = snapshot1

# Tick 2: entity 1 moved slightly
snapshot2 = {
    1: EntityState(1, (10.5, 5.0, 0.0), 90.0, 100, (1.0, 0.0, 0.0)),  # Position changed
    2: EntityState(2, (15.0, 8.0, 0.0), 180.0, 80, (0.0, 1.0, 0.0)),  # No change
}
packet2 = encoder.encode_snapshot(snapshot2)  # Delta: ~10 bytes vs 50+ bytes full

print(f"Full snapshot: {len(packet1)} bytes")
print(f"Delta snapshot: {len(packet2)} bytes")
print(f"Compression ratio: {len(packet1) / len(packet2):.1f}x")

4.3 Network Protocol — UDP vs TCP

TCP:
  Pros: Reliable, ordered delivery
  Cons: Head-of-line blocking (1 packet loss → all subsequent packets wait)
        Higher latency

UDP:
  Pros: Low latency, no head-of-line blocking
  Cons: Unreliable (packet loss, out-of-order)

Best practice: UDP + custom reliability layer

ENet (used by many games):
  - UDP with selective reliability
  - Channel-based (reliable channel for chat, unreliable for positions)
  - Built-in congestion control

QUIC (modern):
  - UDP-based, multiple streams (no head-of-line blocking)
  - TLS 1.3 built-in
  - Faster connection establishment

5. Leaderboard at Scale — Redis Sorted Sets

5.1 Global Leaderboard

# Redis Sorted Set: key → member + score
# Score = MMR, member = player_id

ZADD leaderboard:global 2400 "player:12345"
ZADD leaderboard:global 2150 "player:67890"
ZADD leaderboard:global 2800 "player:11111"

# Get top 10
ZREVRANGE leaderboard:global 0 9 WITHSCORES
# Returns:
# 1) "player:11111"
# 2) "2800"
# 3) "player:12345"
# 4) "2400"
# 5) "player:67890"
# 6) "2150"

# Get player rank (O(log N))
ZREVRANK leaderboard:global "player:12345"
# Returns: 1 (0-indexed, rank #2)

# Get players around a player (context)
ZREVRANGE leaderboard:global <rank-5> <rank+5> WITHSCORES

5.2 Sharded Leaderboard

Problem: Single Redis instance can't handle 100M players
  - ZREVRANGE có thể chậm với large cardinality
  - Memory limits

Solution: Shard by MMR buckets

  Bucket 0-999:    leaderboard:bucket:0
  Bucket 1000-1999: leaderboard:bucket:1
  Bucket 2000-2999: leaderboard:bucket:2
  ...

Khi query top 100:
  1. Query top 100 từ highest bucket (có players)
  2. Nếu không đủ, query bucket tiếp theo

package leaderboard

import (
    "context"
    "fmt"
    "github.com/redis/go-redis/v9"
)

const BucketSize = 1000

type LeaderboardService struct {
    rdb *redis.Client
}

func (ls *LeaderboardService) getBucketKey(mmr int) string {
    bucket := mmr / BucketSize
    return fmt.Sprintf("leaderboard:bucket:%d", bucket)
}

func (ls *LeaderboardService) UpdatePlayerMMR(ctx context.Context, playerID int64, mmr int) error {
    member := fmt.Sprintf("player:%d", playerID)
    bucketKey := ls.getBucketKey(mmr)
    
    // Remove from all buckets (player might have moved buckets)
    // In practice, track old MMR to only remove from 1 bucket
    pipe := ls.rdb.Pipeline()
    for i := 0; i < 10; i++ { // Max 10 buckets (0-9999 MMR)
        pipe.ZRem(ctx, fmt.Sprintf("leaderboard:bucket:%d", i), member)
    }
    
    // Add to new bucket
    pipe.ZAdd(ctx, bucketKey, redis.Z{
        Score:  float64(mmr),
        Member: member,
    })
    
    _, err := pipe.Exec(ctx)
    return err
}

func (ls *LeaderboardService) GetTopPlayers(ctx context.Context, limit int) ([]LeaderboardEntry, error) {
    entries := []LeaderboardEntry{}
    
    // Query from highest bucket downwards
    for bucket := 9; bucket >= 0; bucket-- {
        bucketKey := fmt.Sprintf("leaderboard:bucket:%d", bucket)
        
        // Get top players from this bucket
        results, err := ls.rdb.ZRevRangeWithScores(ctx, bucketKey, 0, int64(limit-1)).Result()
        if err != nil {
            return nil, err
        }
        
        for _, z := range results {
            entries = append(entries, LeaderboardEntry{
                PlayerID: z.Member.(string),
                MMR:      int(z.Score),
            })
        }
        
        if len(entries) >= limit {
            break
        }
    }
    
    return entries[:limit], nil
}

func (ls *LeaderboardService) GetPlayerRank(ctx context.Context, playerID int64, mmr int) (int, error) {
    member := fmt.Sprintf("player:%d", playerID)
    bucketKey := ls.getBucketKey(mmr)
    
    // Get rank within bucket
    rankInBucket, err := ls.rdb.ZRevRank(ctx, bucketKey, member).Result()
    if err != nil {
        return 0, err
    }
    
    // Count players in higher buckets
    countInHigherBuckets := 0
    currentBucket := mmr / BucketSize
    
    for bucket := 9; bucket > currentBucket; bucket-- {
        count, err := ls.rdb.ZCard(ctx, fmt.Sprintf("leaderboard:bucket:%d", bucket)).Result()
        if err != nil {
            return 0, err
        }
        countInHigherBuckets += int(count)
    }
    
    // Also count players in same bucket with higher MMR
    countInSameBucket, err := ls.rdb.ZCount(ctx, bucketKey, 
        fmt.Sprintf("%d", mmr+1), "+inf").Result()
    if err != nil {
        return 0, err
    }
    
    globalRank := countInHigherBuckets + int(countInSameBucket) + 1
    return globalRank, nil
}

type LeaderboardEntry struct {
    PlayerID string
    MMR      int
}

5.3 Regional Leaderboards

leaderboard:global:us-west
leaderboard:global:us-east
leaderboard:global:eu
leaderboard:global:asia

leaderboard:season:2024-Q1:us-west
leaderboard:season:2024-Q1:eu

leaderboard:daily:2024-04-17:asia

5.4 Real-time Updates — Pub/Sub

# Publisher (game server)
import redis

r = redis.Redis(host='localhost', port=6379)

def on_match_end(match):
    for player in match.players:
        # Update leaderboard
        r.zadd('leaderboard:global', {f'player:{player.id}': player.new_mmr})
        
        # Publish rank change event
        old_rank = get_old_rank(player.id)
        new_rank = get_new_rank(player.id)
        
        r.publish('leaderboard:updates', json.dumps({
            'player_id': player.id,
            'old_mmr': player.old_mmr,
            'new_mmr': player.new_mmr,
            'old_rank': old_rank,
            'new_rank': new_rank,
            'timestamp': time.time()
        }))

# Subscriber (WebSocket service → push to client)
def subscribe_leaderboard_updates():
    pubsub = r.pubsub()
    pubsub.subscribe('leaderboard:updates')
    
    for message in pubsub.listen():
        if message['type'] == 'message':
            update = json.loads(message['data'])
            
            # Push to connected clients via WebSocket
            websocket_broadcast(update)

6. Anti-cheat Mechanisms

6.1 Server-side Validation

// Validate player actions on server
func (gs *GameServer) validatePlayerAction(player *Player, action *PlayerAction) error {
    switch action.Type {
    case ActionTypeMove:
        return gs.validateMovement(player, action)
    case ActionTypeShoot:
        return gs.validateShoot(player, action)
    case ActionTypeUseItem:
        return gs.validateItemUse(player, action)
    default:
        return fmt.Errorf("unknown action type")
    }
}

func (gs *GameServer) validateMovement(player *Player, action *PlayerAction) error {
    // 1. Speed check
    distance := action.NewPosition.Distance(player.Position)
    timeElapsed := action.Timestamp.Sub(player.LastActionTime).Seconds()
    
    if timeElapsed > 0 {
        speed := distance / timeElapsed
        maxSpeed := player.GetMaxSpeed() * 1.2 // 20% tolerance for network jitter
        
        if speed > maxSpeed {
            return fmt.Errorf("movement too fast: %.2f > %.2f", speed, maxSpeed)
        }
    }
    
    // 2. Teleport check
    if distance > 10.0 { // Arbitrary threshold
        return fmt.Errorf("teleport detected: distance %.2f", distance)
    }
    
    // 3. Collision check (expensive, do sampling)
    if gs.tickNumber%4 == 0 { // Check every 4 ticks
        if gs.isPositionInsideWall(action.NewPosition) {
            return fmt.Errorf("position inside wall")
        }
    }
    
    // 4. Bounds check
    if !gs.gameMap.IsInBounds(action.NewPosition) {
        return fmt.Errorf("position out of bounds")
    }
    
    return nil
}

func (gs *GameServer) validateShoot(player *Player, action *PlayerAction) error {
    // 1. Rate of fire check
    if time.Since(player.LastShotTime) < player.Weapon.MinFireInterval {
        return fmt.Errorf("shooting too fast")
    }
    
    // 2. Ammo check
    if player.Ammo <= 0 {
        return fmt.Errorf("no ammo")
    }
    
    // 3. Weapon cooldown
    if player.Weapon.IsOnCooldown() {
        return fmt.Errorf("weapon on cooldown")
    }
    
    // 4. Line of sight check (prevent shooting through walls)
    if !gs.hasLineOfSight(player.Position, action.TargetPosition) {
        return fmt.Errorf("no line of sight")
    }
    
    return nil
}

6.2 Heuristic-based Detection

# Detect aimbots and wallhacks using statistics
from dataclasses import dataclass
from typing import List
import numpy as np

@dataclass
class ShotMetrics:
    accuracy: float          # Hit rate
    headshot_rate: float
    reaction_time_ms: float
    target_switch_time_ms: float
    avg_crosshair_distance: float  # Distance from target when shooting

class CheatDetector:
    def __init__(self):
        self.thresholds = {
            'accuracy': 0.90,              # >90% accuracy is suspicious
            'headshot_rate': 0.70,         # >70% headshots
            'reaction_time_ms': 100,       # <100ms reaction time
            'target_switch_time_ms': 50,   # <50ms to switch targets
        }
    
    def analyze_player(self, player_id: int, shots: List[ShotMetrics]) -> dict:
        """
        Analyze player's shots for cheating indicators.
        Returns suspicion scores.
        """
        if len(shots) < 20:  # Need enough data
            return {'suspicious': False, 'reason': 'insufficient_data'}
        
        # Calculate aggregated metrics
        accuracy = np.mean([s.accuracy for s in shots])
        headshot_rate = np.mean([s.headshot_rate for s in shots])
        avg_reaction_time = np.mean([s.reaction_time_ms for s in shots])
        avg_target_switch = np.mean([s.target_switch_time_ms for s in shots])
        
        suspicion_score = 0
        reasons = []
        
        # Check accuracy (aimbot indicator)
        if accuracy > self.thresholds['accuracy']:
            suspicion_score += 30
            reasons.append(f'accuracy_too_high:{accuracy:.2f}')
        
        # Check headshot rate (aimbot indicator)
        if headshot_rate > self.thresholds['headshot_rate']:
            suspicion_score += 25
            reasons.append(f'headshot_rate_too_high:{headshot_rate:.2f}')
        
        # Check reaction time (aimbot indicator)
        if avg_reaction_time < self.thresholds['reaction_time_ms']:
            suspicion_score += 20
            reasons.append(f'reaction_too_fast:{avg_reaction_time:.0f}ms')
        
        # Check target switching (aimbot indicator)
        if avg_target_switch < self.thresholds['target_switch_time_ms']:
            suspicion_score += 15
            reasons.append(f'target_switch_too_fast:{avg_target_switch:.0f}ms')
        
        # Check consistency (bots are too consistent)
        accuracy_std = np.std([s.accuracy for s in shots])
        if accuracy_std < 0.05:  # Too consistent
            suspicion_score += 10
            reasons.append(f'too_consistent:std={accuracy_std:.3f}')
        
        return {
            'player_id': player_id,
            'suspicious': suspicion_score >= 50,
            'suspicion_score': suspicion_score,
            'reasons': reasons,
            'metrics': {
                'accuracy': accuracy,
                'headshot_rate': headshot_rate,
                'avg_reaction_time': avg_reaction_time,
            }
        }

# Store metrics for analysis
class AntiCheatSystem:
    def __init__(self):
        self.player_shots = {}  # player_id -> List[ShotMetrics]
        self.detector = CheatDetector()
    
    def record_shot(self, player_id: int, shot: ShotMetrics):
        if player_id not in self.player_shots:
            self.player_shots[player_id] = []
        
        self.player_shots[player_id].append(shot)
        
        # Analyze after every 50 shots
        if len(self.player_shots[player_id]) % 50 == 0:
            result = self.detector.analyze_player(player_id, self.player_shots[player_id])
            
            if result['suspicious']:
                self.flag_player_for_review(player_id, result)
    
    def flag_player_for_review(self, player_id: int, analysis: dict):
        # Send to admin review queue
        print(f"⚠️  Player {player_id} flagged for cheating")
        print(f"   Suspicion score: {analysis['suspicion_score']}")
        print(f"   Reasons: {', '.join(analysis['reasons'])}")
        
        # Shadow ban (put in cheat pool)
        self.shadow_ban_player(player_id)
    
    def shadow_ban_player(self, player_id: int):
        """
        Shadow ban: player vẫn chơi được nhưng chỉ match với cheaters khác.
        Player không biết bị ban → không tạo account mới ngay.
        """
        pass

6.3 Client Integrity — Anti-tamper

1. Code signing:
   - Executable và DLLs phải có valid signature
   - Detect modified/injected DLLs

2. Memory protection:
   - Encrypt critical data in memory
   - Detect memory editors (Cheat Engine)
   - Use anti-debugging techniques

3. Heartbeat system:
   - Client gửi heartbeat với checksums
   - Server verify checksums match expected values

4. Kernel-level anti-cheat (controversial):
   - Vanguard (Valorant), Easy Anti-Cheat
   - Kernel driver chặn cheats ở OS level
   - Trade-off: Invasive, security concerns

7. Analytics & Telemetry Pipeline

7.1 Event Collection

// Protocol Buffers schema for game events
syntax = "proto3";

message GameEvent {
    string event_id = 1;
    string event_type = 2;  // "match_start", "player_kill", "match_end"
    int64 timestamp = 3;
    string match_id = 4;
    string player_id = 5;
    map<string, string> properties = 6;
}

message PlayerKillEvent {
    string killer_id = 1;
    string victim_id = 2;
    string weapon = 3;
    bool is_headshot = 4;
    float distance = 5;
    Position killer_position = 6;
    Position victim_position = 7;
}

message MatchEndEvent {
    string match_id = 1;
    int32 duration_seconds = 2;
    repeated PlayerStats player_stats = 3;
    string winning_team = 4;
}

// Event producer (game server)
package analytics

import (
    "context"
    "encoding/json"
    "github.com/segmentio/kafka-go"
)

type EventProducer struct {
    writer *kafka.Writer
}

func NewEventProducer(brokers []string) *EventProducer {
    return &EventProducer{
        writer: &kafka.Writer{
            Addr:     kafka.TCP(brokers...),
            Topic:    "game-events",
            Balancer: &kafka.Hash{},  // Hash by player_id for ordering
        },
    }
}

func (p *EventProducer) PublishEvent(ctx context.Context, event *GameEvent) error {
    eventJSON, err := json.Marshal(event)
    if err != nil {
        return err
    }
    
    return p.writer.WriteMessages(ctx, kafka.Message{
        Key:   []byte(event.PlayerID),  // Partition by player
        Value: eventJSON,
    })
}

// Usage in game server
func (gs *GameServer) onPlayerKill(killer, victim *Player, weapon string, isHeadshot bool) {
    // Update game state
    killer.Kills++
    victim.Deaths++
    
    // Publish event to Kafka
    event := &GameEvent{
        EventID:   generateUUID(),
        EventType: "player_kill",
        Timestamp: time.Now().Unix(),
        MatchID:   gs.matchID,
        PlayerID:  killer.ID,
        Properties: map[string]string{
            "victim_id":   victim.ID,
            "weapon":      weapon,
            "is_headshot": fmt.Sprintf("%t", isHeadshot),
            "killer_pos":  fmt.Sprintf("%.2f,%.2f,%.2f", 
                killer.Position.X, killer.Position.Y, killer.Position.Z),
        },
    }
    
    gs.eventProducer.PublishEvent(context.Background(), event)
}

7.2 Stream Processing — Apache Flink

// Flink job: Real-time KDA calculation
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;

public class KDAProcessingJob {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        
        // Kafka source
        FlinkKafkaConsumer<GameEvent> consumer = new FlinkKafkaConsumer<>(
            "game-events",
            new GameEventDeserializationSchema(),
            properties
        );
        
        DataStream<GameEvent> events = env.addSource(consumer);
        
        // Filter kill events
        DataStream<GameEvent> killEvents = events
            .filter(event -> "player_kill".equals(event.getEventType()));
        
        // Calculate KDA per player
        DataStream<PlayerKDA> kda = killEvents
            .keyBy(event -> event.getPlayerId())
            .window(TumblingEventTimeWindows.of(Time.minutes(5)))
            .aggregate(new KDAAggregator());
        
        // Sink to Redis for real-time leaderboard
        kda.addSink(new RedisKDASink());
        
        env.execute("KDA Processing Job");
    }
}

public class KDAAggregator implements AggregateFunction<GameEvent, KDAAccumulator, PlayerKDA> {
    @Override
    public KDAAccumulator createAccumulator() {
        return new KDAAccumulator();
    }
    
    @Override
    public KDAAccumulator add(GameEvent event, KDAAccumulator acc) {
        if (event.getPlayerId().equals(event.getProperties().get("killer_id"))) {
            acc.kills++;
        }
        if (event.getPlayerId().equals(event.getProperties().get("victim_id"))) {
            acc.deaths++;
        }
        // Assists logic...
        return acc;
    }
    
    @Override
    public PlayerKDA getResult(KDAAccumulator acc) {
        double kda = acc.deaths == 0 ? acc.kills + acc.assists :
                     (double) (acc.kills + acc.assists) / acc.deaths;
        return new PlayerKDA(acc.playerId, acc.kills, acc.deaths, acc.assists, kda);
    }
    
    @Override
    public KDAAccumulator merge(KDAAccumulator a, KDAAccumulator b) {
        a.kills += b.kills;
        a.deaths += b.deaths;
        a.assists += b.assists;
        return a;
    }
}

7.3 Metrics Dashboard

# Grafana dashboard queries (PromQL)

# Game server health
up{job="game-server"}

# Active matches
sum(game_matches_active) by (region)

# Average match duration
histogram_quantile(0.95, rate(game_match_duration_seconds_bucket[5m]))

# Player latency p99
histogram_quantile(0.99, rate(game_player_latency_ms_bucket[5m])) by (region)

# Matchmaking queue time
histogram_quantile(0.99, rate(matchmaking_queue_time_seconds_bucket[5m]))

# Cheat detection rate
rate(anticheat_players_flagged_total[1h])

# Server tick rate (should be stable at 64Hz)
rate(game_server_ticks_total[1m]) / 60

8. Game Server Scalability — Dynamic Allocation

8.1 Game Server Architecture

Game Server Modes:

1. Dedicated Server:
   - One server process = one match
   - Isolated, easy to scale
   - Used by: Valorant, CS:GO, PUBG

2. Server Pool (Shard-based):
   - One process handles multiple matches
   - More efficient resource usage
   - Used by: MOBA games (League, DOTA)

3. Hybrid (Lobby + Game):
   - Lightweight lobby server (chat, party)
   - Heavyweight game server (match simulation)

8.2 Dynamic Server Allocation

package serverpool

import (
    "context"
    "fmt"
    "sync"
)

type GameServerPool struct {
    servers    map[string]*GameServerInstance
    mu         sync.RWMutex
    maxServers int
}

type GameServerInstance struct {
    ID              string
    Region          string
    Status          string  // "idle", "starting", "running", "shutting_down"
    CurrentMatches  int
    MaxMatches      int
    CreatedAt       time.Time
    LastHeartbeat   time.Time
}

func (pool *GameServerPool) AllocateServer(ctx context.Context, region string) (*GameServerInstance, error) {
    pool.mu.Lock()
    defer pool.mu.Unlock()
    
    // 1. Find idle server in region
    for _, server := range pool.servers {
        if server.Region == region && 
           server.Status == "idle" && 
           server.CurrentMatches < server.MaxMatches {
            server.CurrentMatches++
            return server, nil
        }
    }
    
    // 2. No idle server, spawn new one
    if len(pool.servers) < pool.maxServers {
        newServer, err := pool.spawnServer(ctx, region)
        if err != nil {
            return nil, err
        }
        pool.servers[newServer.ID] = newServer
        return newServer, nil
    }
    
    // 3. Pool exhausted, wait or error
    return nil, fmt.Errorf("no available servers in region %s", region)
}

func (pool *GameServerPool) spawnServer(ctx context.Context, region string) (*GameServerInstance, error) {
    // Integration with Kubernetes or AWS ECS
    serverID := fmt.Sprintf("game-server-%s-%d", region, time.Now().Unix())
    
    // Deploy container/pod
    err := pool.deployServerContainer(ctx, serverID, region)
    if err != nil {
        return nil, err
    }
    
    server := &GameServerInstance{
        ID:             serverID,
        Region:         region,
        Status:         "starting",
        CurrentMatches: 0,
        MaxMatches:     10,  // One process can handle 10 matches
        CreatedAt:      time.Now(),
    }
    
    return server, nil
}

func (pool *GameServerPool) deployServerContainer(ctx context.Context, serverID, region string) error {
    // Kubernetes API call
    // kubectl apply -f game-server-pod.yaml
    
    // Or AWS ECS
    // ecs.RunTask(...)
    
    return nil
}

func (pool *GameServerPool) ReleaseServer(serverID string) {
    pool.mu.Lock()
    defer pool.mu.Unlock()
    
    server, exists := pool.servers[serverID]
    if !exists {
        return
    }
    
    server.CurrentMatches--
    
    // Auto-scale down: shut down idle servers after 5 minutes
    if server.CurrentMatches == 0 {
        go pool.scheduleServerShutdown(server, 5*time.Minute)
    }
}

func (pool *GameServerPool) scheduleServerShutdown(server *GameServerInstance, delay time.Duration) {
    time.Sleep(delay)
    
    pool.mu.Lock()
    defer pool.mu.Unlock()
    
    // Re-check if still idle
    if server.CurrentMatches == 0 {
        server.Status = "shutting_down"
        pool.shutdownServer(server)
        delete(pool.servers, server.ID)
    }
}

8.3 Kubernetes Deployment

# game-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: game-server
  labels:
    app: game-server
spec:
  replicas: 50  # Base capacity
  selector:
    matchLabels:
      app: game-server
  template:
    metadata:
      labels:
        app: game-server
    spec:
      containers:
      - name: game-server
        image: myregistry/game-server:v1.2.3
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"
        env:
        - name: REGION
          value: "us-west"
        - name: MAX_MATCHES_PER_INSTANCE
          value: "10"
        - name: TICK_RATE
          value: "64"
        ports:
        - containerPort: 7777
          protocol: UDP
          name: game-udp
        - containerPort: 8080
          protocol: TCP
          name: health-http
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: game-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: game-server
  minReplicas: 50
  maxReplicas: 500
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: game_server_active_matches
      target:
        type: AverageValue
        averageValue: "8"  # Scale when avg > 8 matches per pod

---
# Service (headless for per-pod addressing)
apiVersion: v1
kind: Service
metadata:
  name: game-server
spec:
  clusterIP: None  # Headless
  selector:
    app: game-server
  ports:
  - port: 7777
    protocol: UDP
    name: game-udp

8.4 AWS Fargate + Application Load Balancer

# terraform configuration
resource "aws_ecs_cluster" "game_servers" {
  name = "game-servers-cluster"
}

resource "aws_ecs_task_definition" "game_server" {
  family                   = "game-server"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "2048"  # 2 vCPU
  memory                   = "4096"  # 4 GB
  
  container_definitions = jsonencode([{
    name  = "game-server"
    image = "myregistry/game-server:v1.2.3"
    
    portMappings = [{
      containerPort = 7777
      protocol      = "udp"
    }]
    
    environment = [
      { name = "REGION", value = "us-west-2" },
      { name = "TICK_RATE", value = "64" }
    ]
    
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = "/ecs/game-server"
        "awslogs-region"        = "us-west-2"
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])
}

resource "aws_ecs_service" "game_server" {
  name            = "game-server-service"
  cluster         = aws_ecs_cluster.game_servers.id
  task_definition = aws_ecs_task_definition.game_server.arn
  desired_count   = 100
  launch_type     = "FARGATE"
  
  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.game_server.id]
    assign_public_ip = false
  }
  
  # Auto-scaling
  depends_on = [aws_lb_listener.game_server]
}

# Auto Scaling Target
resource "aws_appautoscaling_target" "game_server" {
  max_capacity       = 500
  min_capacity       = 50
  resource_id        = "service/${aws_ecs_cluster.game_servers.name}/${aws_ecs_service.game_server.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

# Auto Scaling Policy
resource "aws_appautoscaling_policy" "game_server_cpu" {
  name               = "game-server-cpu-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.game_server.resource_id
  scalable_dimension = aws_appautoscaling_target.game_server.scalable_dimension
  service_namespace  = aws_appautoscaling_target.game_server.service_namespace
  
  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 70.0
  }
}

9. Data Model & Sharding Strategy

9.1 Player Data Sharding

Players table: 100M+ rows
  → Shard by player_id (consistent hashing)

Shard 1: player_id % 10 = 0
Shard 2: player_id % 10 = 1
...
Shard 10: player_id % 10 = 9

Pros:
  - Even distribution
  - Read/write traffic spread

Cons:
  - Cross-shard queries expensive (e.g., friends list)

9.2 Match Data — Time-based Sharding

-- Partition matches by month
CREATE TABLE matches_2024_01 PARTITION OF matches
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE matches_2024_02 PARTITION OF matches
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Auto-create partitions via cron or tool

-- Query recent matches (hits 1 partition)
SELECT * FROM matches
WHERE created_at >= '2024-04-01' AND created_at < '2024-05-01'
ORDER BY created_at DESC
LIMIT 100;

9.3 Hot/Cold Data Separation

Hot data (recent, frequently accessed):
  - Last 7 days matches → PostgreSQL (SSD)
  - Last 30 days player stats → Redis cache

Cold data (old, rarely accessed):
  - Matches >30 days → S3 (Parquet)
  - Historical stats → Data warehouse (BigQuery, Redshift)

Migration job:
  - Daily cron: move matches older than 30 days to S3
  - Keep match_id → S3 key mapping in DB for retrieval

# Cold storage migration
import boto3
import psycopg2
from datetime import datetime, timedelta

s3 = boto3.client('s3')
conn = psycopg2.connect("dbname=gamedb user=postgres")

def migrate_old_matches():
    cutoff_date = datetime.now() - timedelta(days=30)
    
    cur = conn.cursor()
    cur.execute("""
        SELECT * FROM matches 
        WHERE created_at < %s AND archived = false
        LIMIT 10000
    """, (cutoff_date,))
    
    matches = cur.fetchall()
    
    for match in matches:
        match_id = match[1]
        
        # Serialize to Parquet and upload to S3
        key = f"matches/{cutoff_date.year}/{cutoff_date.month}/{match_id}.parquet"
        s3.put_object(
            Bucket='game-cold-storage',
            Key=key,
            Body=serialize_to_parquet(match)
        )
        
        # Mark as archived
        cur.execute("""
            UPDATE matches SET archived = true, archive_key = %s 
            WHERE match_id = %s
        """, (key, match_id))
    
    conn.commit()
    cur.close()

def retrieve_archived_match(match_id):
    cur = conn.cursor()
    cur.execute("SELECT archive_key FROM matches WHERE match_id = %s", (match_id,))
    key = cur.fetchone()[0]
    
    if key:
        # Fetch from S3
        obj = s3.get_object(Bucket='game-cold-storage', Key=key)
        return deserialize_from_parquet(obj['Body'].read())
    
    return None

10. Performance Optimization

10.1 Database Query Optimization

-- Bad: N+1 query
SELECT * FROM players WHERE id = ?;  -- Repeated for each player

-- Good: Batch query
SELECT * FROM players WHERE id = ANY($1::bigint[]);

-- Index covering query (avoid table lookup)
CREATE INDEX idx_players_leaderboard ON players(mmr DESC, id)
INCLUDE (username, region);

-- Query uses index-only scan
EXPLAIN ANALYZE
SELECT id, username, mmr, region FROM players
ORDER BY mmr DESC LIMIT 100;

-- Result: Index Only Scan (fast!)

10.2 Redis Optimization

# Bad: Multiple round-trips
def get_player_data(player_id):
    profile = redis.hgetall(f"player:{player_id}:profile")
    inventory = redis.smembers(f"player:{player_id}:inventory")
    stats = redis.hgetall(f"player:{player_id}:stats")
    return profile, inventory, stats

# Good: Pipeline (1 round-trip)
def get_player_data_optimized(player_id):
    pipe = redis.pipeline()
    pipe.hgetall(f"player:{player_id}:profile")
    pipe.smembers(f"player:{player_id}:inventory")
    pipe.hgetall(f"player:{player_id}:stats")
    results = pipe.execute()
    return results[0], results[1], results[2]

# Lua script (atomic + server-side execution)
lua_script = """
local player_id = KEYS[1]
local profile = redis.call('HGETALL', 'player:' .. player_id .. ':profile')
local inventory = redis.call('SMEMBERS', 'player:' .. player_id .. ':inventory')
local stats = redis.call('HGETALL', 'player:' .. player_id .. ':stats')
return {profile, inventory, stats}
"""

script_sha = redis.script_load(lua_script)
result = redis.evalsha(script_sha, 1, player_id)

10.3 Network Bandwidth — Compression

# Compress snapshots before sending
import zlib
import msgpack

def compress_snapshot(snapshot):
    # 1. Serialize with MessagePack (efficient binary format)
    packed = msgpack.packb(snapshot, use_bin_type=True)
    
    # 2. Compress with zlib
    compressed = zlib.compress(packed, level=6)
    
    return compressed

def decompress_snapshot(data):
    decompressed = zlib.decompress(data)
    snapshot = msgpack.unpackb(decompressed, raw=False)
    return snapshot

# Before: 5KB per snapshot × 64Hz = 320KB/s
# After:  800 bytes per snapshot × 64Hz = 51KB/s
# Compression ratio: 6.25x

10.4 CDN for Static Assets

Game assets (textures, models, sounds) served via CDN:
  - CloudFront, Cloudflare
  - Edge caching → low latency globally
  - Version manifest for updates

manifest.json:
{
  "version": "1.2.3",
  "assets": {
    "textures/player.png": {
      "url": "https://cdn.game.com/assets/v1.2.3/textures/player.png",
      "hash": "sha256:abc123...",
      "size": 524288
    },
    "models/weapon.obj": {
      "url": "https://cdn.game.com/assets/v1.2.3/models/weapon.obj",
      "hash": "sha256:def456...",
      "size": 1048576
    }
  }
}

11. Interview Questions & Real-world Scenarios

11.1 System Design Questions

Q1: Design a matchmaking system for 10M concurrent players.

Requirements:
  - <3s matchmaking time
  - Balanced skill (MMR ± 100)
  - Regional (minimize latency)
  - Party support (5-man stack)

Solution:
  1. Shard by region (US, EU, Asia independent queues)
  2. In-memory pools per region (fast matching)
  3. Expand MMR tolerance over time (graceful degradation)
  4. Priority queue for long-waiting players
  5. Validate latency before match creation
  6. Fallback to cross-region if queue too long

Components:
  - API Gateway → Route to regional MM workers
  - MM Worker (stateful, in-memory queue)
  - Redis (track party sessions)
  - Game Server Pool (Kubernetes HPA)

Tradeoffs:
  - Fast matching vs perfect balance
  - Regional isolation vs queue depth
  - Skill matching vs wait time

Q2: How to prevent cheating in a competitive FPS game?

Anti-cheat layers:

1. Server Authority:
   - All game logic on server (clients send inputs only)
   - Validate inputs (speed, fire rate, ammo)
   - Lag compensation (rewind time for hit detection)

2. Heuristic Detection:
   - Track accuracy, headshot rate, reaction time
   - Flag outliers (>90% accuracy, <100ms reactions)
   - Shadow ban (match cheaters with cheaters)

3. Client Integrity:
   - Code signing (detect modified executables)
   - Memory protection (encrypt critical data)
   - Kernel-level anti-cheat (Vanguard, Easy Anti-Cheat)

4. Replay Analysis:
   - Store full match replays
   - ML model for behavior analysis
   - Manual review by moderators

5. Community Reports:
   - In-game report system
   - Crowd-sourced detection (Overwatch system in CS:GO)

Tradeoffs:
  - Security vs performance (server-side validation adds latency)
  - Privacy vs anti-cheat (kernel drivers are invasive)
  - False positives vs false negatives

Q3: Design a global leaderboard that updates in real-time.

Scale: 100M players, update every second

Naive approach: Single DB table, ORDER BY mmr DESC
  → Won't scale (query too slow)

Solution:

1. Redis Sorted Sets:
   - ZADD leaderboard:global {mmr} {player_id}
   - ZREVRANGE (top N) in O(log N + N)
   - ZREVRANK (player rank) in O(log N)

2. Sharding by MMR buckets:
   - Bucket 0-999, 1000-1999, etc.
   - Query top 100: fetch from highest bucket first
   - Player rank: count higher buckets + rank in bucket

3. Regional leaderboards:
   - leaderboard:us, leaderboard:eu
   - Less contention, faster queries

4. Time-based leaderboards:
   - leaderboard:daily:2024-04-17
   - leaderboard:season:2024-Q1
   - Reset at intervals

5. Cache & pre-compute:
   - Cache top 100 (refresh every 5s)
   - Pre-compute ranks for top 10K only
   - On-demand compute for others

6. Pub/Sub for real-time updates:
   - Redis pub/sub: leaderboard:updates
   - Push to WebSocket clients

Write throughput:
  - 1M match ends/hour = 278 updates/s
  - Redis can handle 100K+ writes/s
  → Easily scalable with sharding

11.2 Debugging Scenarios

Scenario 1: Players complaining about high latency spikes.

Investigation:

1. Check server metrics:
   - CPU usage (>80% → slow ticks)
   - Memory (swap → latency)
   - Network (packet loss, bandwidth saturation)

2. Check game server tick rate:
   - Should be stable 64Hz
   - If dropping to 30Hz → performance issue

3. Check client logs:
   - Network RTT (ping command)
   - Packet loss rate
   - Jitter (variance in latency)

4. Regional analysis:
   - Specific region affected? (routing issue)
   - ISP provider pattern? (peering problem)

5. Time-based pattern:
   - Peak hours only? (need more capacity)
   - Random spikes? (DDoS, network congestion)

Resolution:
  - Scale up game servers (increase CPU/memory)
  - Add more servers in affected region
  - Optimize tick loop (profiling)
  - CDN for asset delivery (reduce bandwidth)
  - Contact ISP if routing issue

Scenario 2: Leaderboard showing stale data for some players.

Investigation:

1. Check Redis:
   - Keys exist? (cache miss → query DB)
   - TTL correct? (premature expiration)
   - Replication lag? (read from stale replica)

2. Check update pipeline:
   - Kafka consumer lag? (event backlog)
   - Flink job running? (check job status)
   - Match result events published? (producer issue)

3. Check clock skew:
   - Server clocks synchronized? (NTP)
   - Timestamp ordering issues

4. Race conditions:
   - Concurrent updates to same player
   - Last-write-wins conflict

Resolution:
  - Restart Kafka consumers (clear lag)
  - Fix Redis replication (force sync)
  - Add idempotency keys (prevent duplicate updates)
  - Use Redis transactions (MULTI/EXEC)

Scenario 3: Matchmaking queue time suddenly 10x slower.

Investigation:

1. Check queue depth:
   - Redis: LLEN matchmaking:queue:{region}
   - Sudden spike? (viral event, streamer)

2. Check MM worker health:
   - All workers running?
   - CPU/memory usage?
   - Deadlock or infinite loop?

3. Check game server availability:
   - Enough idle servers?
   - K8s HPA triggered?
   - Deployment in progress? (reduced capacity)

4. Check match creation rate:
   - Throughput dropped?
   - Database slow? (query timeout)
   - External service down? (auth, profile)

Resolution:
  - Scale MM workers horizontally
  - Increase game server pool (trigger HPA manually)
  - Loosen matching constraints (temp: expand MMR delta)
  - Disable non-critical features (analytics)
  - Communicate to players (expected wait time)

11.3 Trade-off Questions

Q: Client prediction vs server authority?

Client Prediction:
  Pros: Instant feedback, smooth gameplay
  Cons: Misprediction → rubber-banding, more complex

Server Authority:
  Pros: Hack-proof, simpler logic
  Cons: Input lag, feels sluggish

Best practice: Hybrid
  - Client predicts own movement (instant)
  - Server validates and reconciles
  - Lag compensation for hit detection

Q: UDP vs TCP for game networking?

TCP:
  Pros: Reliable, ordered
  Cons: Head-of-line blocking (1 lost packet → all wait)

UDP:
  Pros: Low latency, no head-of-line blocking
  Cons: Packet loss, out-of-order delivery

Best practice: UDP + custom reliability
  - Use UDP for game state (tolerate loss)
  - Reliable channel for critical events (kills, match end)
  - Redundancy (send critical data multiple times)
  - Modern: QUIC (UDP + streams, no HOL blocking)

Q: Persistent game server vs serverless functions?

Persistent Server:
  Pros: Stateful (game state in memory), low latency
  Cons: Cost (idle servers), complex orchestration

Serverless (Lambda, Cloud Run):
  Pros: No idle cost, auto-scale
  Cons: Cold start latency, stateless (need external state store)

Best practice: Persistent for real-time matches
  - Serverless for stateless APIs (profile, inventory)
  - Persistent game servers (ECS, K8s)
  - Hybrid: serverless matchmaking + persistent game servers

12. Production War Stories

12.1 The Great Matchmaking Meltdown

Context: Launch day của Season 2, 5M players login cùng lúc.

Problem:
  - Matchmaking queue timeout sau 30s
  - Database deadlocks (concurrent MMR updates)
  - Redis OOM (queue không drain được)

Root cause:
  - MM workers không scale đủ nhanh
  - DB connection pool exhausted
  - Redis memory limit đạt (evict policy không đúng)

Resolution:
  1. Emergency scale: 10x MM workers
  2. Increase DB connection pool (100 → 500)
  3. Redis maxmemory-policy: allkeys-lru (was noeviction)
  4. Disable analytics pipeline (save DB connections)
  5. Communicate: "High queue times, we're scaling"

Lessons learned:
  - Load test at 2x expected peak, not 1x
  - Circuit breakers (degrade features under load)
  - Observability (know WHEN to scale)
  - Runbook (predefined responses to incidents)

12.2 The Invisible Cheater

Context: Top player với 99% win rate, community phàn nàn.

Problem:
  - Anti-cheat không flag player này
  - Manual review: gameplay có vẻ hợp lý
  - Replay không thấy bằng chứng cheat

Investigation:
  - Deep dive vào server logs
  - Player luôn win rounds nhờ "luck" (enemy DC)
  - Pattern: opponent disconnect 10% matches

Discovery:
  - DDoS attack: player sniff opponent IPs → DDoS unggak họ offline
  - Game server không detect này (network layer attack)

Resolution:
  - Hide player IPs (use relay servers)
  - Ban player (violate ToS)
  - Implement IP obfuscation (WebRTC TURN servers)

Lessons learned:
  - Anti-cheat không chỉ là game logic
  - Network-level attacks cần network-level defenses
  - Privacy = security (hide sensitive info)

12.3 The Leaderboard Apocalypse

Context: End of season, calculate final ranks cho 100M players.

Problem:
  - Batch job chạy 36 hours (SLA: 2 hours)
  - Database locked (millions UPDATE queries)
  - Players không thấy rewards

Root cause:
  - Sequential processing (1 player at a time)
  - N+1 query pattern (fetch stats → update rank)
  - No indexing on critical columns

Resolution:
  1. Parallel batch processing (Spark job, 1000 workers)
  2. Bulk updates (batch 10K players per query)
  3. Add index: CREATE INDEX ON players(mmr, season_id)
  4. Pre-aggregate stats (no join during rank compute)

Improvement:
  - 36 hours → 1.5 hours
  - No downtime (read replicas for queries)

Lessons learned:
  - Batch operations scale differently than online queries
  - Pre-compute where possible (trade storage for speed)
  - Test with production data size, not toy data

13. Advanced Topics

13.1 Cross-region Matchmaking

Problem: US player đợi 5 phút vì không đủ players → frustration

Solution: Cross-region matching (US ↔ EU)

Constraints:
  - Max latency: 150ms (US-West ↔ EU-West ~ 140ms)
  - Only if queue time >2 minutes
  - Prefer same-region (better experience)

Implementation:
  1. Regional priority queue (US players search US first)
  2. After 2 min: expand to adjacent regions (US → EU)
  3. Select server in middle region (US-East, closer to EU)
  4. Weight teams by latency (balance ping advantage)

Trade-off: Match quality (latency) vs queue time

13.2 Skill-based Rating Systems

ELO: Simple, 1v1 games (chess)
Glicko-2: Accounts for rating volatility (period of inactivity)
TrueSkill: Team-based, accounts for individual contribution

TrueSkill (Microsoft):
  - μ (mu): Skill estimate
  - σ (sigma): Uncertainty
  - New player: μ=1500, σ=500 (very uncertain)
  - Veteran: μ=2000, σ=50 (confident estimate)
  
  After match: Bayesian update
  - Win against lower-rated: small μ increase, σ decreases
  - Upset win: large μ increase

Benefits:
  - Better for teams (each player has individual skill)
  - Handles uncertainty (new players matched conservatively)

13.3 Server Tick Optimization

Tick rate vs CPU usage:
  - 16 Hz (60ms): Mobile games, low-action
  - 32 Hz (31ms): Casual games
  - 64 Hz (15.6ms): Competitive FPS (CS:GO, Valorant)
  - 128 Hz (7.8ms): Pro-level CS:GO servers

Optimization techniques:
  1. Spatial partitioning (only update nearby entities)
     - Grid-based: divide map into cells
     - Only check collisions within same/adjacent cells
  
  2. Interest management (send updates only for visible entities)
     - Client FOV culling
     - Don't send entities behind walls
  
  3. Delta compression (previous section)
  
  4. Multi-threading (physics, AI on separate threads)
     - Game loop on main thread (deterministic)
     - Physics simulation on worker threads
     - Merge results in main thread
  
  5. SIMD (vectorized calculations)
     - Process 4 entities at once (SSE, AVX)

Tổng kết

Gaming backend là một trong những hệ thống phức tạp nhất trong software engineering:

Real-time: <100ms latency, 64Hz tick rate
Scale: 10M concurrent, billions events/day
Consistency: Authoritative server, anti-cheat
Availability: 99.95% uptime (downtime = lost revenue)

Key takeaways:

Authoritative server - Server is source of truth, validate all client actions
Client prediction + reconciliation - Balance responsive gameplay vs server authority
Matchmaking - Skill + latency, graceful degradation over time
Leaderboard - Redis sorted sets, shard by MMR buckets
Anti-cheat - Multi-layered (server validation, heuristics, client integrity)
Scalability - Dynamic game server allocation (K8s HPA), regional sharding
Analytics - Kafka + Flink for real-time event processing
Optimization - Delta compression, tick optimization, spatial partitioning

Real-world trade-offs:

Fast matching vs balanced teams
Client prediction vs server authority
Security vs performance
Skill matching vs queue time

Gaming backend yêu cầu sự kết hợp giữa distributed systems, real-time systems, networking, và game design. Mỗi quyết định kỹ thuật ảnh hưởng trực tiếp đến player experience — một frame drop hay lag spike có thể khiến player rage quit.

Remember: At the end of the day, make games fun, not just technically impressive. Performance is a feature, but gameplay is king.