Domain: Gaming Backend
Mα»t game multiplayer cΓ³ thα» cΓ³ 10 triα»u concurrent players, matchmaking trong <3s, game server tick rate 64Hz, vΓ latency dΖ°α»i 100ms. ΔαΊ±ng sau smooth experience ΔΓ³ lΓ cαΊ£ mα»t hα» thα»ng backend phα»©c tαΊ‘p: authoritative servers chα»ng hack, real-time state sync vα»i client prediction, matchmaking xα» lΓ½ millions requests/second, vΓ leaderboard global update theo thα»i gian thα»±c.
Section nΓ y mΓ΄ tαΊ£ cΓ‘ch thiαΊΏt kαΊΏ gaming backend α» scale β tα»« MOBA nhΖ° DOTA 2, battle royale nhΖ° PUBG, cho ΔαΊΏn competitive shooters nhΖ° Valorant.
1. Gaming System Architecture β Tα»ng quan
1.1 Core Components
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client (Game) β
β β
β Game Engine (Unity/Unreal) βββΊ Input βββΊ Network Layer β
β β β
ββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββ
β
WebSocket/UDP β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway / Load Balancer β
β (Session routing, TLS termination) β
ββββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββ
β β β
ββββββββββββΌββββββββββ βββββββΌββββββββββ β
β Matchmaking β β Lobby Service β β
β Service β β β β
β (Find opponents) β β (Party, chat) β β
ββββββββββ¬ββββββββββββ ββββββββββββββββββ β
β β
βΌ βΌ
ββββββββββββββββββββββ βββββββββββββββββββββββββ
β Game Server β β Auth / Profile β
β (Authoritative β β Service β
β state, tick) βββββββββββββ€ (User data, inv) β
ββββββββββ¬ββββββββββββ βββββββββββββββββββββββββ
β
β Game events
βΌ
ββββββββββββββββββββββ
β Analytics Pipeline β
β (Kafka, Flink) β
ββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββ βββββββββββββββββββββββββ
β Leaderboard β β Anti-cheat β
β (Redis sorted set) β β (Heuristics, ML) β
ββββββββββββββββββββββ βββββββββββββββββββββββββ
1.2 Critical Requirements
Latency:
- Matchmaking: <3 seconds
- Game state sync: 16-64ms (30-60Hz tick rate)
- Leaderboard update: <100ms
- Analytics: near real-time (1-5s)
Scale:
- 10M+ concurrent players (peak)
- 100K+ game servers running
- 1M+ matches/hour
- Billions events/day
Availability:
- 99.95%+ uptime (downtime = lost revenue)
- Regional failover (players can't wait)
- Graceful degradation (match quality > no match)
2. Matchmaking System β ELO & Skill-based
2.1 ELO/MMR Rating System
MMR (Matchmaking Rating) = Hidden skill number
- Win β MMR tΔng
- Lose β MMR giαΊ£m
- Delta phα»₯ thuα»c opponent's MMR
CΓ΄ng thα»©c ELO cΖ‘ bαΊ£n:
R' = R + K Γ (S - E)
R = Current rating
R' = New rating
K = K-factor (sensitivity, 16-32)
S = Actual score (1 = win, 0 = lose, 0.5 = draw)
E = Expected score = 1 / (1 + 10^((R_opponent - R) / 400))
VΓ dα»₯:
Alice (MMR=1600) vs Bob (MMR=1800):
E_alice = 1 / (1 + 10^((1800-1600)/400)) = 1 / (1 + 10^0.5) β 0.24
E_bob = 1 - E_alice β 0.76
NαΊΏu Alice thαΊ―ng (surprise!):
R'_alice = 1600 + 32 Γ (1 - 0.24) = 1624.3
R'_bob = 1800 + 32 Γ (0 - 0.76) = 1775.7
NαΊΏu Bob thαΊ―ng (expected):
R'_bob = 1800 + 32 Γ (1 - 0.76) = 1807.7 (chα» tΔng 7.7)
2.2 Schema Design
-- Players
CREATE TABLE players (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT UNIQUE NOT NULL,
username VARCHAR(100) NOT NULL,
mmr INT NOT NULL DEFAULT 1500, -- Hidden skill rating
display_rank VARCHAR(50), -- "Gold II", "Diamond IV"
wins INT DEFAULT 0,
losses INT DEFAULT 0,
win_rate NUMERIC(5,2) DEFAULT 50.0,
peak_mmr INT DEFAULT 1500,
region VARCHAR(10) NOT NULL, -- "us-west", "eu", "asia"
last_match_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_players_mmr ON players(mmr DESC, region);
CREATE INDEX idx_players_region_mmr ON players(region, mmr DESC);
CREATE INDEX idx_players_last_match ON players(last_match_at DESC);
-- Match history
CREATE TABLE matches (
id BIGSERIAL PRIMARY KEY,
match_id VARCHAR(50) UNIQUE NOT NULL,
mode VARCHAR(50) NOT NULL, -- "ranked", "casual", "tournament"
region VARCHAR(10) NOT NULL,
avg_mmr INT NOT NULL, -- Average team MMR
duration_seconds INT,
winner_team INT, -- 1 or 2
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
ended_at TIMESTAMPTZ
);
CREATE INDEX idx_matches_created ON matches(created_at DESC);
CREATE INDEX idx_matches_mode_region ON matches(mode, region, created_at DESC);
-- Match participants
CREATE TABLE match_participants (
id BIGSERIAL PRIMARY KEY,
match_id VARCHAR(50) NOT NULL REFERENCES matches(match_id),
player_id BIGINT NOT NULL REFERENCES players(id),
team INT NOT NULL, -- 1 or 2
mmr_before INT NOT NULL,
mmr_after INT NOT NULL,
mmr_delta INT NOT NULL,
kills INT DEFAULT 0,
deaths INT DEFAULT 0,
assists INT DEFAULT 0,
damage_dealt BIGINT DEFAULT 0,
is_mvp BOOLEAN DEFAULT false,
UNIQUE(match_id, player_id)
);
CREATE INDEX idx_participants_player ON match_participants(player_id, id DESC);
CREATE INDEX idx_participants_match ON match_participants(match_id);
2.3 Matchmaking Algorithm β Skill + Latency
package matchmaking
import (
"context"
"fmt"
"math"
"time"
)
// MatchmakingRequest from a player
type MatchmakingRequest struct {
PlayerID int64
MMR int
Region string
Latencies map[string]int // region -> latency in ms
QueuedAt time.Time
}
// MatchmakingPool β in-memory queue
type MatchmakingPool struct {
queue []*MatchmakingRequest
}
// Matchmaking parameters
const (
TeamSize = 5
MaxMMRDelta = 100 // Initial tolerance
MaxMMRDeltaMax = 500 // Max tolerance after wait
MMRDeltaIncrement = 50 // Per 10 seconds
MaxLatency = 80 // ms
MaxWaitTime = 120 // seconds
)
// FindMatch attempts to create a balanced match (5v5)
func (p *MatchmakingPool) FindMatch(ctx context.Context) (*Match, error) {
if len(p.queue) < TeamSize*2 {
return nil, fmt.Errorf("not enough players")
}
// Sort queue by MMR for faster matching
sortByMMR(p.queue)
for i := 0; i < len(p.queue)-TeamSize*2+1; i++ {
anchor := p.queue[i]
// Calculate max allowed MMR delta based on wait time
waitSeconds := time.Since(anchor.QueuedAt).Seconds()
maxDelta := calculateMaxMMRDelta(waitSeconds)
// Try to form two teams
team1, team2, err := p.formTeams(anchor, maxDelta)
if err != nil {
continue
}
// Validate latency constraints
if !validateLatency(team1, team2) {
continue
}
// Found a valid match!
return &Match{
MatchID: generateMatchID(),
Team1: team1,
Team2: team2,
AvgMMR: calculateAvgMMR(team1, team2),
Region: selectBestRegion(team1, team2),
}, nil
}
return nil, fmt.Errorf("no valid match found")
}
func calculateMaxMMRDelta(waitSeconds float64) int {
// Expand tolerance over time (graceful degradation)
delta := MaxMMRDelta + int(waitSeconds/10.0)*MMRDeltaIncrement
if delta > MaxMMRDeltaMax {
return MaxMMRDeltaMax
}
return delta
}
func (p *MatchmakingPool) formTeams(anchor *MatchmakingRequest, maxDelta int) (
[]*MatchmakingRequest, []*MatchmakingRequest, error) {
candidates := []*MatchmakingRequest{anchor}
// Find players within MMR range
for _, req := range p.queue {
if req.PlayerID == anchor.PlayerID {
continue
}
if math.Abs(float64(req.MMR-anchor.MMR)) <= float64(maxDelta) {
candidates = append(candidates, req)
if len(candidates) == TeamSize*2 {
break
}
}
}
if len(candidates) < TeamSize*2 {
return nil, nil, fmt.Errorf("not enough candidates")
}
// Greedy team balancing: alternating assignment by MMR
sortByMMR(candidates)
team1, team2 := []*MatchmakingRequest{}, []*MatchmakingRequest{}
for i, c := range candidates {
if i%2 == 0 {
team1 = append(team1, c)
} else {
team2 = append(team2, c)
}
}
// Check team balance (MMR difference should be small)
mmr1 := avgMMR(team1)
mmr2 := avgMMR(team2)
if math.Abs(float64(mmr1-mmr2)) > float64(maxDelta) {
return nil, nil, fmt.Errorf("teams unbalanced")
}
return team1, team2, nil
}
func validateLatency(team1, team2 []*MatchmakingRequest) bool {
// Find region vα»i lowest max latency
allPlayers := append(team1, team2...)
for region := range allPlayers[0].Latencies {
maxLatency := 0
for _, p := range allPlayers {
if lat, ok := p.Latencies[region]; ok {
if lat > maxLatency {
maxLatency = lat
}
} else {
// Player khΓ΄ng thα» connect region nΓ y
maxLatency = 9999
break
}
}
if maxLatency <= MaxLatency {
return true
}
}
return false
}
func selectBestRegion(team1, team2 []*MatchmakingRequest) string {
allPlayers := append(team1, team2...)
regionScores := make(map[string]int)
// Score = sum of latencies (lower is better)
for region := range allPlayers[0].Latencies {
score := 0
for _, p := range allPlayers {
if lat, ok := p.Latencies[region]; ok {
score += lat
} else {
score = 999999
break
}
}
regionScores[region] = score
}
// Return region with lowest total latency
bestRegion := ""
bestScore := 999999
for region, score := range regionScores {
if score < bestScore {
bestScore = score
bestRegion = region
}
}
return bestRegion
}
2.4 Matchmaking Service Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client (Game Client) β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β Queue Request
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway (gRPC/HTTP) β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Matchmaking Service (Stateful) β
β β
β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β
β β MM Worker 1 β β MM Worker 2 β β MM Worker N β β
β β (Region: US) β β (Region: EU) β β (Region: Asia) β β
β β β β β β β β
β β In-memory pool β β In-memory pool β β In-memory pool β β
β β + matching β β + matching β β + matching β β
β ββββββββββ¬ββββββββ ββββββββββ¬ββββββββ ββββββββββ¬ββββββββ β
β β β β β
βββββββββββββΌββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββ
β β β
βββββββββββββββββββββ΄ββββββββββββββββββββ
β
Match found β
βΌ
ββββββββββββββββββββ
β Game Server Pool β
β (Spawn instance) β
ββββββββββββββββββββ
Sharding strategy: Shard matchmaking workers by region.
- US players β US worker
- EU players β EU worker
- Cross-region matching chα» khi khΓ΄ng Δα»§ players
Scalability: Stateless API Gateway + stateful MM workers (sticky session khΓ΄ng cαΊ§n)
3. Real-time Multiplayer β Authoritative Server
3.1 Client-Server Models
Model 1: Peer-to-peer (P2P)
βββββββββββ
β Client Aββββββββββββββββββββββββ
ββββββ¬βββββ β
β β
β Direct connection β
βΌ β
βββββββββββ βββββββββββ
β Client BββββββββββββββββββΊβ Client Cβ
βββββββββββ βββββββββββ
Pros: Low latency (no server hop), low cost
Cons: Vulnerable to cheating (client can lie), NAT traversal issues
Model 2: Client-Server (Authoritative)
βββββββββββ βββββββββββ βββββββββββ
β Client Aββββββββ€ βββββββΊβ Client Bβ
βββββββββββ β Game β βββββββββββ
β Server β
βββββββββββ β(Truth) β βββββββββββ
β Client Cββββββββ€ βββββββΊβ Client Dβ
βββββββββββ βββββββββββ βββββββββββ
Pros: Server validates all actions (anti-cheat), consistent state
Cons: Higher latency (client β server β client), server cost
Modern competitive games: Authoritative server (anti-cheat > latency)
3.2 Game Loop β Tick-based Simulation
package gameserver
import (
"time"
)
const (
TickRate = 64 // 64 ticks/second (15.625 ms/tick)
TickDuration = time.Second / TickRate
)
type GameServer struct {
matchID string
players map[int64]*Player
gameState *GameState
tickNumber int64
running bool
}
func (gs *GameServer) Run() {
ticker := time.NewTicker(TickDuration)
defer ticker.Stop()
gs.running = true
for gs.running {
select {
case <-ticker.C:
gs.Tick()
}
}
}
func (gs *GameServer) Tick() {
gs.tickNumber++
// 1. Process player inputs (buffered from network)
for _, player := range gs.players {
gs.processPlayerInput(player)
}
// 2. Update game simulation
gs.updatePhysics()
gs.updateEntities()
gs.detectCollisions()
gs.updateGameLogic()
// 3. Broadcast state snapshot to clients
snapshot := gs.createSnapshot()
gs.broadcastSnapshot(snapshot)
// 4. Cleanup
gs.removeDeadEntities()
// Log tick performance
if gs.tickNumber%64 == 0 {
// Every second
gs.logTickMetrics()
}
}
func (gs *GameServer) processPlayerInput(player *Player) {
// Dequeue input commands from buffer
for player.inputBuffer.Len() > 0 {
input := player.inputBuffer.Dequeue()
// Validate input (anti-cheat)
if !gs.validateInput(player, input) {
// Suspicious input, log and ignore
gs.logSuspiciousInput(player, input)
continue
}
// Apply input to player entity
gs.applyInput(player, input)
}
}
func (gs *GameServer) updatePhysics() {
deltaTime := float64(TickDuration) / float64(time.Second)
for _, entity := range gs.gameState.Entities {
if !entity.IsStatic {
// Update position based on velocity
entity.Position.X += entity.Velocity.X * deltaTime
entity.Position.Y += entity.Velocity.Y * deltaTime
entity.Position.Z += entity.Velocity.Z * deltaTime
// Apply gravity
entity.Velocity.Z -= 9.8 * deltaTime
// Apply friction
entity.Velocity.X *= 0.99
entity.Velocity.Y *= 0.99
}
}
}
func (gs *GameServer) validateInput(player *Player, input *PlayerInput) bool {
// Speed hack detection
distance := input.Position.Distance(player.LastPosition)
timeDelta := input.Timestamp - player.LastInputTime
if timeDelta > 0 {
speed := distance / timeDelta.Seconds()
if speed > player.MaxSpeed*1.2 { // 20% tolerance
return false
}
}
// Teleport detection
if distance > player.MaxSpeed*0.1 { // Max distance in 1 tick
return false
}
return true
}
type Snapshot struct {
TickNumber int64
Timestamp time.Time
Entities []*EntityState
}
func (gs *GameServer) createSnapshot() *Snapshot {
entities := make([]*EntityState, 0, len(gs.gameState.Entities))
for _, entity := range gs.gameState.Entities {
entities = append(entities, &EntityState{
ID: entity.ID,
Position: entity.Position,
Rotation: entity.Rotation,
Velocity: entity.Velocity,
Health: entity.Health,
})
}
return &Snapshot{
TickNumber: gs.tickNumber,
Timestamp: time.Now(),
Entities: entities,
}
}
func (gs *GameServer) broadcastSnapshot(snapshot *Snapshot) {
// Delta compression: chα» gα»i entities thay Δα»i
for playerID, player := range gs.players {
delta := gs.computeDelta(player.LastSnapshot, snapshot)
gs.sendToPlayer(playerID, delta)
player.LastSnapshot = snapshot
}
}
3.3 Client Prediction & Server Reconciliation
// Client-side prediction (JavaScript/Unity C#)
class GameClient {
constructor() {
this.localPlayer = null;
this.lastServerSnapshot = null;
this.pendingInputs = []; // Not yet acknowledged by server
this.localTickNumber = 0;
}
// Client tick (may run faster than server)
tick() {
this.localTickNumber++;
// 1. Capture player input
const input = this.captureInput();
input.tickNumber = this.localTickNumber;
// 2. Apply input locally (prediction)
this.applyInput(this.localPlayer, input);
// 3. Send input to server
this.sendInputToServer(input);
// 4. Store input for reconciliation
this.pendingInputs.push(input);
// 5. Interpolate other entities
this.interpolateEntities();
// 6. Render
this.render();
}
// Receive server snapshot
onServerSnapshot(snapshot) {
// 1. Update last known server state
this.lastServerSnapshot = snapshot;
// 2. Find player entity in snapshot
const serverPlayer = snapshot.entities.find(e => e.id === this.localPlayer.id);
// 3. Server reconciliation
this.reconcileWithServer(serverPlayer, snapshot.tickNumber);
// 4. Update other entities
for (const entity of snapshot.entities) {
if (entity.id !== this.localPlayer.id) {
this.updateEntity(entity);
}
}
}
reconcileWithServer(serverPlayer, serverTickNumber) {
// Server position might differ from client prediction (latency, packet loss)
// 1. Remove acknowledged inputs
this.pendingInputs = this.pendingInputs.filter(
input => input.tickNumber > serverTickNumber
);
// 2. Rewind to server state
this.localPlayer.position = serverPlayer.position.clone();
this.localPlayer.velocity = serverPlayer.velocity.clone();
// 3. Replay pending inputs (not yet acknowledged)
for (const input of this.pendingInputs) {
this.applyInput(this.localPlayer, input);
}
// If position error is too large, snap to server (teleport detection)
const error = this.localPlayer.position.distanceTo(serverPlayer.position);
if (error > 5.0) { // Threshold
console.warn("Large prediction error, snapping to server");
this.localPlayer.position = serverPlayer.position.clone();
}
}
// Entity interpolation (smooth movement of other players)
interpolateEntities() {
const now = Date.now();
const renderTime = now - 100; // Render 100ms in the past
for (const entity of this.entities) {
if (entity.id === this.localPlayer.id) continue;
// Find two snapshots to interpolate between
const [before, after] = this.findSnapshotsForTime(entity, renderTime);
if (before && after) {
const t = (renderTime - before.timestamp) / (after.timestamp - before.timestamp);
entity.position = Vector3.lerp(before.position, after.position, t);
entity.rotation = Quaternion.slerp(before.rotation, after.rotation, t);
}
}
}
}
3.4 Lag Compensation β Rewind Time
// Server-side lag compensation for hit detection
type HistoricalState struct {
Timestamp time.Time
Entities map[int64]*EntityState
}
type GameServer struct {
// ... other fields
stateHistory []*HistoricalState // Ring buffer
historySize int
}
func (gs *GameServer) HandlePlayerShoot(player *Player, shootEvent *ShootEvent) {
// Client gα»i: "I shot at tick X, aiming at position Y"
// 1. Estimate client's view time (account for latency)
clientViewTime := shootEvent.Timestamp.Add(-player.Ping / 2)
// 2. Rewind game state to that time
historicalState := gs.getHistoricalState(clientViewTime)
// 3. Perform hit detection in historical state
target := gs.raycast(shootEvent.Origin, shootEvent.Direction, historicalState)
if target != nil {
// Hit confirmed!
gs.applyDamage(target, shootEvent.Damage)
// Broadcast hit event to all clients
gs.broadcastHitEvent(&HitEvent{
Shooter: player.ID,
Target: target.ID,
Damage: shootEvent.Damage,
Timestamp: time.Now(),
})
}
}
func (gs *GameServer) getHistoricalState(t time.Time) *HistoricalState {
// Binary search trong ring buffer
for i := len(gs.stateHistory) - 1; i >= 0; i-- {
state := gs.stateHistory[i]
if state.Timestamp.Before(t) || state.Timestamp.Equal(t) {
return state
}
}
return gs.stateHistory[0] // Fallback to oldest
}
func (gs *GameServer) Tick() {
// ... existing tick logic
// Save current state to history
snapshot := gs.createSnapshot()
gs.saveToHistory(snapshot)
}
func (gs *GameServer) saveToHistory(snapshot *Snapshot) {
state := &HistoricalState{
Timestamp: snapshot.Timestamp,
Entities: make(map[int64]*EntityState),
}
for _, entity := range snapshot.Entities {
state.Entities[entity.ID] = entity.Clone()
}
// Ring buffer
if len(gs.stateHistory) >= gs.historySize {
gs.stateHistory = gs.stateHistory[1:]
}
gs.stateHistory = append(gs.stateHistory, state)
}
4. Game State Synchronization β Delta Compression
4.1 Snapshot vs Delta
Full Snapshot: Gα»i toΓ n bα» game state mα»i tick
Pros: Simple, reliable
Cons: Bandwidth-heavy (10KB+ per snapshot Γ 64Hz = 640KB/s per player)
Delta Compression: Chα» gα»i thay Δα»i so vα»i snapshot trΖ°α»c
Pros: Bandwidth-efficient (100-500 bytes per tick)
Cons: More complex, requires reliable delivery of baseline
4.2 Delta Encoding Implementation
# Python example (actual game sα» dα»₯ng C++/Go)
import struct
from dataclasses import dataclass
from typing import Dict, List, Optional
@dataclass
class EntityState:
id: int
position: tuple # (x, y, z)
rotation: float
health: int
velocity: tuple # (vx, vy, vz)
class DeltaEncoder:
def __init__(self):
self.baseline_snapshot: Optional[Dict[int, EntityState]] = None
def encode_snapshot(self, current_snapshot: Dict[int, EntityState]) -> bytes:
"""
Encode snapshot as delta from baseline.
Returns binary delta packet.
"""
if self.baseline_snapshot is None:
# First snapshot β send full
return self._encode_full_snapshot(current_snapshot)
delta_data = bytearray()
# Header: packet type (1 = delta)
delta_data.append(1)
# Changed/new entities
changed_entities = []
for entity_id, current_state in current_snapshot.items():
if entity_id not in self.baseline_snapshot:
# New entity
changed_entities.append((entity_id, current_state, 'new'))
else:
baseline_state = self.baseline_snapshot[entity_id]
if self._state_changed(baseline_state, current_state):
changed_entities.append((entity_id, current_state, 'changed'))
# Deleted entities
deleted_entities = [
entity_id for entity_id in self.baseline_snapshot
if entity_id not in current_snapshot
]
# Write counts
delta_data.extend(struct.pack('!HH', len(changed_entities), len(deleted_entities)))
# Write changed entities
for entity_id, state, change_type in changed_entities:
if change_type == 'new':
delta_data.extend(self._encode_entity_full(entity_id, state))
else:
delta_data.extend(self._encode_entity_delta(
entity_id, self.baseline_snapshot[entity_id], state
))
# Write deleted entities
for entity_id in deleted_entities:
delta_data.extend(struct.pack('!I', entity_id))
return bytes(delta_data)
def _state_changed(self, baseline: EntityState, current: EntityState) -> bool:
"""
Check if entity state has changed significantly.
Use epsilon for floating point comparison.
"""
epsilon = 0.01
# Position changed?
for i in range(3):
if abs(baseline.position[i] - current.position[i]) > epsilon:
return True
# Rotation changed?
if abs(baseline.rotation - current.rotation) > epsilon:
return True
# Health changed?
if baseline.health != current.health:
return True
# Velocity changed?
for i in range(3):
if abs(baseline.velocity[i] - current.velocity[i]) > epsilon:
return True
return False
def _encode_entity_delta(self, entity_id: int, baseline: EntityState, current: EntityState) -> bytes:
"""
Encode only changed fields with bit flags.
"""
data = bytearray()
# Entity ID
data.extend(struct.pack('!I', entity_id))
# Bit flags for changed fields
flags = 0
flag_data = bytearray()
# Position (bit 0)
if any(abs(baseline.position[i] - current.position[i]) > 0.01 for i in range(3)):
flags |= (1 << 0)
# Compress position: use int16 with 0.01 precision
for coord in current.position:
flag_data.extend(struct.pack('!h', int(coord * 100)))
# Rotation (bit 1)
if abs(baseline.rotation - current.rotation) > 0.01:
flags |= (1 << 1)
flag_data.extend(struct.pack('!h', int(current.rotation * 100)))
# Health (bit 2)
if baseline.health != current.health:
flags |= (1 << 2)
flag_data.append(current.health & 0xFF)
# Velocity (bit 3)
if any(abs(baseline.velocity[i] - current.velocity[i]) > 0.01 for i in range(3)):
flags |= (1 << 3)
for v in current.velocity:
flag_data.extend(struct.pack('!h', int(v * 100)))
# Write flags and data
data.append(flags & 0xFF)
data.extend(flag_data)
return bytes(data)
def _encode_full_snapshot(self, snapshot: Dict[int, EntityState]) -> bytes:
"""Full snapshot encoding for baseline."""
data = bytearray()
data.append(0) # packet type: full snapshot
data.extend(struct.pack('!H', len(snapshot)))
for entity_id, state in snapshot.items():
data.extend(self._encode_entity_full(entity_id, state))
return bytes(data)
def _encode_entity_full(self, entity_id: int, state: EntityState) -> bytes:
data = bytearray()
data.extend(struct.pack('!I', entity_id))
# Position (3 floats)
for coord in state.position:
data.extend(struct.pack('!f', coord))
# Rotation
data.extend(struct.pack('!f', state.rotation))
# Health
data.append(state.health & 0xFF)
# Velocity
for v in state.velocity:
data.extend(struct.pack('!f', v))
return bytes(data)
# Usage
encoder = DeltaEncoder()
# Tick 1
snapshot1 = {
1: EntityState(1, (10.0, 5.0, 0.0), 90.0, 100, (1.0, 0.0, 0.0)),
2: EntityState(2, (15.0, 8.0, 0.0), 180.0, 80, (0.0, 1.0, 0.0)),
}
packet1 = encoder.encode_snapshot(snapshot1) # Full snapshot
encoder.baseline_snapshot = snapshot1
# Tick 2: entity 1 moved slightly
snapshot2 = {
1: EntityState(1, (10.5, 5.0, 0.0), 90.0, 100, (1.0, 0.0, 0.0)), # Position changed
2: EntityState(2, (15.0, 8.0, 0.0), 180.0, 80, (0.0, 1.0, 0.0)), # No change
}
packet2 = encoder.encode_snapshot(snapshot2) # Delta: ~10 bytes vs 50+ bytes full
print(f"Full snapshot: {len(packet1)} bytes")
print(f"Delta snapshot: {len(packet2)} bytes")
print(f"Compression ratio: {len(packet1) / len(packet2):.1f}x")
4.3 Network Protocol β UDP vs TCP
TCP:
Pros: Reliable, ordered delivery
Cons: Head-of-line blocking (1 packet loss β all subsequent packets wait)
Higher latency
UDP:
Pros: Low latency, no head-of-line blocking
Cons: Unreliable (packet loss, out-of-order)
Best practice: UDP + custom reliability layer
ENet (used by many games):
- UDP with selective reliability
- Channel-based (reliable channel for chat, unreliable for positions)
- Built-in congestion control
QUIC (modern):
- UDP-based, multiple streams (no head-of-line blocking)
- TLS 1.3 built-in
- Faster connection establishment
5. Leaderboard at Scale β Redis Sorted Sets
5.1 Global Leaderboard
# Redis Sorted Set: key β member + score
# Score = MMR, member = player_id
ZADD leaderboard:global 2400 "player:12345"
ZADD leaderboard:global 2150 "player:67890"
ZADD leaderboard:global 2800 "player:11111"
# Get top 10
ZREVRANGE leaderboard:global 0 9 WITHSCORES
# Returns:
# 1) "player:11111"
# 2) "2800"
# 3) "player:12345"
# 4) "2400"
# 5) "player:67890"
# 6) "2150"
# Get player rank (O(log N))
ZREVRANK leaderboard:global "player:12345"
# Returns: 1 (0-indexed, rank #2)
# Get players around a player (context)
ZREVRANGE leaderboard:global <rank-5> <rank+5> WITHSCORES
5.2 Sharded Leaderboard
Problem: Single Redis instance can't handle 100M players
- ZREVRANGE cΓ³ thα» chαΊm vα»i large cardinality
- Memory limits
Solution: Shard by MMR buckets
Bucket 0-999: leaderboard:bucket:0
Bucket 1000-1999: leaderboard:bucket:1
Bucket 2000-2999: leaderboard:bucket:2
...
Khi query top 100:
1. Query top 100 tα»« highest bucket (cΓ³ players)
2. NαΊΏu khΓ΄ng Δα»§, query bucket tiαΊΏp theo
package leaderboard
import (
"context"
"fmt"
"github.com/redis/go-redis/v9"
)
const BucketSize = 1000
type LeaderboardService struct {
rdb *redis.Client
}
func (ls *LeaderboardService) getBucketKey(mmr int) string {
bucket := mmr / BucketSize
return fmt.Sprintf("leaderboard:bucket:%d", bucket)
}
func (ls *LeaderboardService) UpdatePlayerMMR(ctx context.Context, playerID int64, mmr int) error {
member := fmt.Sprintf("player:%d", playerID)
bucketKey := ls.getBucketKey(mmr)
// Remove from all buckets (player might have moved buckets)
// In practice, track old MMR to only remove from 1 bucket
pipe := ls.rdb.Pipeline()
for i := 0; i < 10; i++ { // Max 10 buckets (0-9999 MMR)
pipe.ZRem(ctx, fmt.Sprintf("leaderboard:bucket:%d", i), member)
}
// Add to new bucket
pipe.ZAdd(ctx, bucketKey, redis.Z{
Score: float64(mmr),
Member: member,
})
_, err := pipe.Exec(ctx)
return err
}
func (ls *LeaderboardService) GetTopPlayers(ctx context.Context, limit int) ([]LeaderboardEntry, error) {
entries := []LeaderboardEntry{}
// Query from highest bucket downwards
for bucket := 9; bucket >= 0; bucket-- {
bucketKey := fmt.Sprintf("leaderboard:bucket:%d", bucket)
// Get top players from this bucket
results, err := ls.rdb.ZRevRangeWithScores(ctx, bucketKey, 0, int64(limit-1)).Result()
if err != nil {
return nil, err
}
for _, z := range results {
entries = append(entries, LeaderboardEntry{
PlayerID: z.Member.(string),
MMR: int(z.Score),
})
}
if len(entries) >= limit {
break
}
}
return entries[:limit], nil
}
func (ls *LeaderboardService) GetPlayerRank(ctx context.Context, playerID int64, mmr int) (int, error) {
member := fmt.Sprintf("player:%d", playerID)
bucketKey := ls.getBucketKey(mmr)
// Get rank within bucket
rankInBucket, err := ls.rdb.ZRevRank(ctx, bucketKey, member).Result()
if err != nil {
return 0, err
}
// Count players in higher buckets
countInHigherBuckets := 0
currentBucket := mmr / BucketSize
for bucket := 9; bucket > currentBucket; bucket-- {
count, err := ls.rdb.ZCard(ctx, fmt.Sprintf("leaderboard:bucket:%d", bucket)).Result()
if err != nil {
return 0, err
}
countInHigherBuckets += int(count)
}
// Also count players in same bucket with higher MMR
countInSameBucket, err := ls.rdb.ZCount(ctx, bucketKey,
fmt.Sprintf("%d", mmr+1), "+inf").Result()
if err != nil {
return 0, err
}
globalRank := countInHigherBuckets + int(countInSameBucket) + 1
return globalRank, nil
}
type LeaderboardEntry struct {
PlayerID string
MMR int
}
5.3 Regional Leaderboards
leaderboard:global:us-west
leaderboard:global:us-east
leaderboard:global:eu
leaderboard:global:asia
leaderboard:season:2024-Q1:us-west
leaderboard:season:2024-Q1:eu
leaderboard:daily:2024-04-17:asia
5.4 Real-time Updates β Pub/Sub
# Publisher (game server)
import redis
r = redis.Redis(host='localhost', port=6379)
def on_match_end(match):
for player in match.players:
# Update leaderboard
r.zadd('leaderboard:global', {f'player:{player.id}': player.new_mmr})
# Publish rank change event
old_rank = get_old_rank(player.id)
new_rank = get_new_rank(player.id)
r.publish('leaderboard:updates', json.dumps({
'player_id': player.id,
'old_mmr': player.old_mmr,
'new_mmr': player.new_mmr,
'old_rank': old_rank,
'new_rank': new_rank,
'timestamp': time.time()
}))
# Subscriber (WebSocket service β push to client)
def subscribe_leaderboard_updates():
pubsub = r.pubsub()
pubsub.subscribe('leaderboard:updates')
for message in pubsub.listen():
if message['type'] == 'message':
update = json.loads(message['data'])
# Push to connected clients via WebSocket
websocket_broadcast(update)
6. Anti-cheat Mechanisms
6.1 Server-side Validation
// Validate player actions on server
func (gs *GameServer) validatePlayerAction(player *Player, action *PlayerAction) error {
switch action.Type {
case ActionTypeMove:
return gs.validateMovement(player, action)
case ActionTypeShoot:
return gs.validateShoot(player, action)
case ActionTypeUseItem:
return gs.validateItemUse(player, action)
default:
return fmt.Errorf("unknown action type")
}
}
func (gs *GameServer) validateMovement(player *Player, action *PlayerAction) error {
// 1. Speed check
distance := action.NewPosition.Distance(player.Position)
timeElapsed := action.Timestamp.Sub(player.LastActionTime).Seconds()
if timeElapsed > 0 {
speed := distance / timeElapsed
maxSpeed := player.GetMaxSpeed() * 1.2 // 20% tolerance for network jitter
if speed > maxSpeed {
return fmt.Errorf("movement too fast: %.2f > %.2f", speed, maxSpeed)
}
}
// 2. Teleport check
if distance > 10.0 { // Arbitrary threshold
return fmt.Errorf("teleport detected: distance %.2f", distance)
}
// 3. Collision check (expensive, do sampling)
if gs.tickNumber%4 == 0 { // Check every 4 ticks
if gs.isPositionInsideWall(action.NewPosition) {
return fmt.Errorf("position inside wall")
}
}
// 4. Bounds check
if !gs.gameMap.IsInBounds(action.NewPosition) {
return fmt.Errorf("position out of bounds")
}
return nil
}
func (gs *GameServer) validateShoot(player *Player, action *PlayerAction) error {
// 1. Rate of fire check
if time.Since(player.LastShotTime) < player.Weapon.MinFireInterval {
return fmt.Errorf("shooting too fast")
}
// 2. Ammo check
if player.Ammo <= 0 {
return fmt.Errorf("no ammo")
}
// 3. Weapon cooldown
if player.Weapon.IsOnCooldown() {
return fmt.Errorf("weapon on cooldown")
}
// 4. Line of sight check (prevent shooting through walls)
if !gs.hasLineOfSight(player.Position, action.TargetPosition) {
return fmt.Errorf("no line of sight")
}
return nil
}
6.2 Heuristic-based Detection
# Detect aimbots and wallhacks using statistics
from dataclasses import dataclass
from typing import List
import numpy as np
@dataclass
class ShotMetrics:
accuracy: float # Hit rate
headshot_rate: float
reaction_time_ms: float
target_switch_time_ms: float
avg_crosshair_distance: float # Distance from target when shooting
class CheatDetector:
def __init__(self):
self.thresholds = {
'accuracy': 0.90, # >90% accuracy is suspicious
'headshot_rate': 0.70, # >70% headshots
'reaction_time_ms': 100, # <100ms reaction time
'target_switch_time_ms': 50, # <50ms to switch targets
}
def analyze_player(self, player_id: int, shots: List[ShotMetrics]) -> dict:
"""
Analyze player's shots for cheating indicators.
Returns suspicion scores.
"""
if len(shots) < 20: # Need enough data
return {'suspicious': False, 'reason': 'insufficient_data'}
# Calculate aggregated metrics
accuracy = np.mean([s.accuracy for s in shots])
headshot_rate = np.mean([s.headshot_rate for s in shots])
avg_reaction_time = np.mean([s.reaction_time_ms for s in shots])
avg_target_switch = np.mean([s.target_switch_time_ms for s in shots])
suspicion_score = 0
reasons = []
# Check accuracy (aimbot indicator)
if accuracy > self.thresholds['accuracy']:
suspicion_score += 30
reasons.append(f'accuracy_too_high:{accuracy:.2f}')
# Check headshot rate (aimbot indicator)
if headshot_rate > self.thresholds['headshot_rate']:
suspicion_score += 25
reasons.append(f'headshot_rate_too_high:{headshot_rate:.2f}')
# Check reaction time (aimbot indicator)
if avg_reaction_time < self.thresholds['reaction_time_ms']:
suspicion_score += 20
reasons.append(f'reaction_too_fast:{avg_reaction_time:.0f}ms')
# Check target switching (aimbot indicator)
if avg_target_switch < self.thresholds['target_switch_time_ms']:
suspicion_score += 15
reasons.append(f'target_switch_too_fast:{avg_target_switch:.0f}ms')
# Check consistency (bots are too consistent)
accuracy_std = np.std([s.accuracy for s in shots])
if accuracy_std < 0.05: # Too consistent
suspicion_score += 10
reasons.append(f'too_consistent:std={accuracy_std:.3f}')
return {
'player_id': player_id,
'suspicious': suspicion_score >= 50,
'suspicion_score': suspicion_score,
'reasons': reasons,
'metrics': {
'accuracy': accuracy,
'headshot_rate': headshot_rate,
'avg_reaction_time': avg_reaction_time,
}
}
# Store metrics for analysis
class AntiCheatSystem:
def __init__(self):
self.player_shots = {} # player_id -> List[ShotMetrics]
self.detector = CheatDetector()
def record_shot(self, player_id: int, shot: ShotMetrics):
if player_id not in self.player_shots:
self.player_shots[player_id] = []
self.player_shots[player_id].append(shot)
# Analyze after every 50 shots
if len(self.player_shots[player_id]) % 50 == 0:
result = self.detector.analyze_player(player_id, self.player_shots[player_id])
if result['suspicious']:
self.flag_player_for_review(player_id, result)
def flag_player_for_review(self, player_id: int, analysis: dict):
# Send to admin review queue
print(f"β οΈ Player {player_id} flagged for cheating")
print(f" Suspicion score: {analysis['suspicion_score']}")
print(f" Reasons: {', '.join(analysis['reasons'])}")
# Shadow ban (put in cheat pool)
self.shadow_ban_player(player_id)
def shadow_ban_player(self, player_id: int):
"""
Shadow ban: player vαΊ«n chΖ‘i Δược nhΖ°ng chα» match vα»i cheaters khΓ‘c.
Player khΓ΄ng biαΊΏt bα» ban β khΓ΄ng tαΊ‘o account mα»i ngay.
"""
pass
6.3 Client Integrity β Anti-tamper
1. Code signing:
- Executable vΓ DLLs phαΊ£i cΓ³ valid signature
- Detect modified/injected DLLs
2. Memory protection:
- Encrypt critical data in memory
- Detect memory editors (Cheat Engine)
- Use anti-debugging techniques
3. Heartbeat system:
- Client gα»i heartbeat vα»i checksums
- Server verify checksums match expected values
4. Kernel-level anti-cheat (controversial):
- Vanguard (Valorant), Easy Anti-Cheat
- Kernel driver chαΊ·n cheats α» OS level
- Trade-off: Invasive, security concerns
7. Analytics & Telemetry Pipeline
7.1 Event Collection
// Protocol Buffers schema for game events
syntax = "proto3";
message GameEvent {
string event_id = 1;
string event_type = 2; // "match_start", "player_kill", "match_end"
int64 timestamp = 3;
string match_id = 4;
string player_id = 5;
map<string, string> properties = 6;
}
message PlayerKillEvent {
string killer_id = 1;
string victim_id = 2;
string weapon = 3;
bool is_headshot = 4;
float distance = 5;
Position killer_position = 6;
Position victim_position = 7;
}
message MatchEndEvent {
string match_id = 1;
int32 duration_seconds = 2;
repeated PlayerStats player_stats = 3;
string winning_team = 4;
}
// Event producer (game server)
package analytics
import (
"context"
"encoding/json"
"github.com/segmentio/kafka-go"
)
type EventProducer struct {
writer *kafka.Writer
}
func NewEventProducer(brokers []string) *EventProducer {
return &EventProducer{
writer: &kafka.Writer{
Addr: kafka.TCP(brokers...),
Topic: "game-events",
Balancer: &kafka.Hash{}, // Hash by player_id for ordering
},
}
}
func (p *EventProducer) PublishEvent(ctx context.Context, event *GameEvent) error {
eventJSON, err := json.Marshal(event)
if err != nil {
return err
}
return p.writer.WriteMessages(ctx, kafka.Message{
Key: []byte(event.PlayerID), // Partition by player
Value: eventJSON,
})
}
// Usage in game server
func (gs *GameServer) onPlayerKill(killer, victim *Player, weapon string, isHeadshot bool) {
// Update game state
killer.Kills++
victim.Deaths++
// Publish event to Kafka
event := &GameEvent{
EventID: generateUUID(),
EventType: "player_kill",
Timestamp: time.Now().Unix(),
MatchID: gs.matchID,
PlayerID: killer.ID,
Properties: map[string]string{
"victim_id": victim.ID,
"weapon": weapon,
"is_headshot": fmt.Sprintf("%t", isHeadshot),
"killer_pos": fmt.Sprintf("%.2f,%.2f,%.2f",
killer.Position.X, killer.Position.Y, killer.Position.Z),
},
}
gs.eventProducer.PublishEvent(context.Background(), event)
}
7.2 Stream Processing β Apache Flink
// Flink job: Real-time KDA calculation
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
public class KDAProcessingJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Kafka source
FlinkKafkaConsumer<GameEvent> consumer = new FlinkKafkaConsumer<>(
"game-events",
new GameEventDeserializationSchema(),
properties
);
DataStream<GameEvent> events = env.addSource(consumer);
// Filter kill events
DataStream<GameEvent> killEvents = events
.filter(event -> "player_kill".equals(event.getEventType()));
// Calculate KDA per player
DataStream<PlayerKDA> kda = killEvents
.keyBy(event -> event.getPlayerId())
.window(TumblingEventTimeWindows.of(Time.minutes(5)))
.aggregate(new KDAAggregator());
// Sink to Redis for real-time leaderboard
kda.addSink(new RedisKDASink());
env.execute("KDA Processing Job");
}
}
public class KDAAggregator implements AggregateFunction<GameEvent, KDAAccumulator, PlayerKDA> {
@Override
public KDAAccumulator createAccumulator() {
return new KDAAccumulator();
}
@Override
public KDAAccumulator add(GameEvent event, KDAAccumulator acc) {
if (event.getPlayerId().equals(event.getProperties().get("killer_id"))) {
acc.kills++;
}
if (event.getPlayerId().equals(event.getProperties().get("victim_id"))) {
acc.deaths++;
}
// Assists logic...
return acc;
}
@Override
public PlayerKDA getResult(KDAAccumulator acc) {
double kda = acc.deaths == 0 ? acc.kills + acc.assists :
(double) (acc.kills + acc.assists) / acc.deaths;
return new PlayerKDA(acc.playerId, acc.kills, acc.deaths, acc.assists, kda);
}
@Override
public KDAAccumulator merge(KDAAccumulator a, KDAAccumulator b) {
a.kills += b.kills;
a.deaths += b.deaths;
a.assists += b.assists;
return a;
}
}
7.3 Metrics Dashboard
# Grafana dashboard queries (PromQL)
# Game server health
up{job="game-server"}
# Active matches
sum(game_matches_active) by (region)
# Average match duration
histogram_quantile(0.95, rate(game_match_duration_seconds_bucket[5m]))
# Player latency p99
histogram_quantile(0.99, rate(game_player_latency_ms_bucket[5m])) by (region)
# Matchmaking queue time
histogram_quantile(0.99, rate(matchmaking_queue_time_seconds_bucket[5m]))
# Cheat detection rate
rate(anticheat_players_flagged_total[1h])
# Server tick rate (should be stable at 64Hz)
rate(game_server_ticks_total[1m]) / 60
8. Game Server Scalability β Dynamic Allocation
8.1 Game Server Architecture
Game Server Modes:
1. Dedicated Server:
- One server process = one match
- Isolated, easy to scale
- Used by: Valorant, CS:GO, PUBG
2. Server Pool (Shard-based):
- One process handles multiple matches
- More efficient resource usage
- Used by: MOBA games (League, DOTA)
3. Hybrid (Lobby + Game):
- Lightweight lobby server (chat, party)
- Heavyweight game server (match simulation)
8.2 Dynamic Server Allocation
package serverpool
import (
"context"
"fmt"
"sync"
)
type GameServerPool struct {
servers map[string]*GameServerInstance
mu sync.RWMutex
maxServers int
}
type GameServerInstance struct {
ID string
Region string
Status string // "idle", "starting", "running", "shutting_down"
CurrentMatches int
MaxMatches int
CreatedAt time.Time
LastHeartbeat time.Time
}
func (pool *GameServerPool) AllocateServer(ctx context.Context, region string) (*GameServerInstance, error) {
pool.mu.Lock()
defer pool.mu.Unlock()
// 1. Find idle server in region
for _, server := range pool.servers {
if server.Region == region &&
server.Status == "idle" &&
server.CurrentMatches < server.MaxMatches {
server.CurrentMatches++
return server, nil
}
}
// 2. No idle server, spawn new one
if len(pool.servers) < pool.maxServers {
newServer, err := pool.spawnServer(ctx, region)
if err != nil {
return nil, err
}
pool.servers[newServer.ID] = newServer
return newServer, nil
}
// 3. Pool exhausted, wait or error
return nil, fmt.Errorf("no available servers in region %s", region)
}
func (pool *GameServerPool) spawnServer(ctx context.Context, region string) (*GameServerInstance, error) {
// Integration with Kubernetes or AWS ECS
serverID := fmt.Sprintf("game-server-%s-%d", region, time.Now().Unix())
// Deploy container/pod
err := pool.deployServerContainer(ctx, serverID, region)
if err != nil {
return nil, err
}
server := &GameServerInstance{
ID: serverID,
Region: region,
Status: "starting",
CurrentMatches: 0,
MaxMatches: 10, // One process can handle 10 matches
CreatedAt: time.Now(),
}
return server, nil
}
func (pool *GameServerPool) deployServerContainer(ctx context.Context, serverID, region string) error {
// Kubernetes API call
// kubectl apply -f game-server-pod.yaml
// Or AWS ECS
// ecs.RunTask(...)
return nil
}
func (pool *GameServerPool) ReleaseServer(serverID string) {
pool.mu.Lock()
defer pool.mu.Unlock()
server, exists := pool.servers[serverID]
if !exists {
return
}
server.CurrentMatches--
// Auto-scale down: shut down idle servers after 5 minutes
if server.CurrentMatches == 0 {
go pool.scheduleServerShutdown(server, 5*time.Minute)
}
}
func (pool *GameServerPool) scheduleServerShutdown(server *GameServerInstance, delay time.Duration) {
time.Sleep(delay)
pool.mu.Lock()
defer pool.mu.Unlock()
// Re-check if still idle
if server.CurrentMatches == 0 {
server.Status = "shutting_down"
pool.shutdownServer(server)
delete(pool.servers, server.ID)
}
}
8.3 Kubernetes Deployment
# game-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: game-server
labels:
app: game-server
spec:
replicas: 50 # Base capacity
selector:
matchLabels:
app: game-server
template:
metadata:
labels:
app: game-server
spec:
containers:
- name: game-server
image: myregistry/game-server:v1.2.3
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
env:
- name: REGION
value: "us-west"
- name: MAX_MATCHES_PER_INSTANCE
value: "10"
- name: TICK_RATE
value: "64"
ports:
- containerPort: 7777
protocol: UDP
name: game-udp
- containerPort: 8080
protocol: TCP
name: health-http
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: game-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: game-server
minReplicas: 50
maxReplicas: 500
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: game_server_active_matches
target:
type: AverageValue
averageValue: "8" # Scale when avg > 8 matches per pod
---
# Service (headless for per-pod addressing)
apiVersion: v1
kind: Service
metadata:
name: game-server
spec:
clusterIP: None # Headless
selector:
app: game-server
ports:
- port: 7777
protocol: UDP
name: game-udp
8.4 AWS Fargate + Application Load Balancer
# terraform configuration
resource "aws_ecs_cluster" "game_servers" {
name = "game-servers-cluster"
}
resource "aws_ecs_task_definition" "game_server" {
family = "game-server"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "2048" # 2 vCPU
memory = "4096" # 4 GB
container_definitions = jsonencode([{
name = "game-server"
image = "myregistry/game-server:v1.2.3"
portMappings = [{
containerPort = 7777
protocol = "udp"
}]
environment = [
{ name = "REGION", value = "us-west-2" },
{ name = "TICK_RATE", value = "64" }
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/game-server"
"awslogs-region" = "us-west-2"
"awslogs-stream-prefix" = "ecs"
}
}
}])
}
resource "aws_ecs_service" "game_server" {
name = "game-server-service"
cluster = aws_ecs_cluster.game_servers.id
task_definition = aws_ecs_task_definition.game_server.arn
desired_count = 100
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.game_server.id]
assign_public_ip = false
}
# Auto-scaling
depends_on = [aws_lb_listener.game_server]
}
# Auto Scaling Target
resource "aws_appautoscaling_target" "game_server" {
max_capacity = 500
min_capacity = 50
resource_id = "service/${aws_ecs_cluster.game_servers.name}/${aws_ecs_service.game_server.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
# Auto Scaling Policy
resource "aws_appautoscaling_policy" "game_server_cpu" {
name = "game-server-cpu-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.game_server.resource_id
scalable_dimension = aws_appautoscaling_target.game_server.scalable_dimension
service_namespace = aws_appautoscaling_target.game_server.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
}
}
9. Data Model & Sharding Strategy
9.1 Player Data Sharding
Players table: 100M+ rows
β Shard by player_id (consistent hashing)
Shard 1: player_id % 10 = 0
Shard 2: player_id % 10 = 1
...
Shard 10: player_id % 10 = 9
Pros:
- Even distribution
- Read/write traffic spread
Cons:
- Cross-shard queries expensive (e.g., friends list)
9.2 Match Data β Time-based Sharding
-- Partition matches by month
CREATE TABLE matches_2024_01 PARTITION OF matches
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE matches_2024_02 PARTITION OF matches
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Auto-create partitions via cron or tool
-- Query recent matches (hits 1 partition)
SELECT * FROM matches
WHERE created_at >= '2024-04-01' AND created_at < '2024-05-01'
ORDER BY created_at DESC
LIMIT 100;
9.3 Hot/Cold Data Separation
Hot data (recent, frequently accessed):
- Last 7 days matches β PostgreSQL (SSD)
- Last 30 days player stats β Redis cache
Cold data (old, rarely accessed):
- Matches >30 days β S3 (Parquet)
- Historical stats β Data warehouse (BigQuery, Redshift)
Migration job:
- Daily cron: move matches older than 30 days to S3
- Keep match_id β S3 key mapping in DB for retrieval
# Cold storage migration
import boto3
import psycopg2
from datetime import datetime, timedelta
s3 = boto3.client('s3')
conn = psycopg2.connect("dbname=gamedb user=postgres")
def migrate_old_matches():
cutoff_date = datetime.now() - timedelta(days=30)
cur = conn.cursor()
cur.execute("""
SELECT * FROM matches
WHERE created_at < %s AND archived = false
LIMIT 10000
""", (cutoff_date,))
matches = cur.fetchall()
for match in matches:
match_id = match[1]
# Serialize to Parquet and upload to S3
key = f"matches/{cutoff_date.year}/{cutoff_date.month}/{match_id}.parquet"
s3.put_object(
Bucket='game-cold-storage',
Key=key,
Body=serialize_to_parquet(match)
)
# Mark as archived
cur.execute("""
UPDATE matches SET archived = true, archive_key = %s
WHERE match_id = %s
""", (key, match_id))
conn.commit()
cur.close()
def retrieve_archived_match(match_id):
cur = conn.cursor()
cur.execute("SELECT archive_key FROM matches WHERE match_id = %s", (match_id,))
key = cur.fetchone()[0]
if key:
# Fetch from S3
obj = s3.get_object(Bucket='game-cold-storage', Key=key)
return deserialize_from_parquet(obj['Body'].read())
return None
10. Performance Optimization
10.1 Database Query Optimization
-- Bad: N+1 query
SELECT * FROM players WHERE id = ?; -- Repeated for each player
-- Good: Batch query
SELECT * FROM players WHERE id = ANY($1::bigint[]);
-- Index covering query (avoid table lookup)
CREATE INDEX idx_players_leaderboard ON players(mmr DESC, id)
INCLUDE (username, region);
-- Query uses index-only scan
EXPLAIN ANALYZE
SELECT id, username, mmr, region FROM players
ORDER BY mmr DESC LIMIT 100;
-- Result: Index Only Scan (fast!)
10.2 Redis Optimization
# Bad: Multiple round-trips
def get_player_data(player_id):
profile = redis.hgetall(f"player:{player_id}:profile")
inventory = redis.smembers(f"player:{player_id}:inventory")
stats = redis.hgetall(f"player:{player_id}:stats")
return profile, inventory, stats
# Good: Pipeline (1 round-trip)
def get_player_data_optimized(player_id):
pipe = redis.pipeline()
pipe.hgetall(f"player:{player_id}:profile")
pipe.smembers(f"player:{player_id}:inventory")
pipe.hgetall(f"player:{player_id}:stats")
results = pipe.execute()
return results[0], results[1], results[2]
# Lua script (atomic + server-side execution)
lua_script = """
local player_id = KEYS[1]
local profile = redis.call('HGETALL', 'player:' .. player_id .. ':profile')
local inventory = redis.call('SMEMBERS', 'player:' .. player_id .. ':inventory')
local stats = redis.call('HGETALL', 'player:' .. player_id .. ':stats')
return {profile, inventory, stats}
"""
script_sha = redis.script_load(lua_script)
result = redis.evalsha(script_sha, 1, player_id)
10.3 Network Bandwidth β Compression
# Compress snapshots before sending
import zlib
import msgpack
def compress_snapshot(snapshot):
# 1. Serialize with MessagePack (efficient binary format)
packed = msgpack.packb(snapshot, use_bin_type=True)
# 2. Compress with zlib
compressed = zlib.compress(packed, level=6)
return compressed
def decompress_snapshot(data):
decompressed = zlib.decompress(data)
snapshot = msgpack.unpackb(decompressed, raw=False)
return snapshot
# Before: 5KB per snapshot Γ 64Hz = 320KB/s
# After: 800 bytes per snapshot Γ 64Hz = 51KB/s
# Compression ratio: 6.25x
10.4 CDN for Static Assets
Game assets (textures, models, sounds) served via CDN:
- CloudFront, Cloudflare
- Edge caching β low latency globally
- Version manifest for updates
manifest.json:
{
"version": "1.2.3",
"assets": {
"textures/player.png": {
"url": "https://cdn.game.com/assets/v1.2.3/textures/player.png",
"hash": "sha256:abc123...",
"size": 524288
},
"models/weapon.obj": {
"url": "https://cdn.game.com/assets/v1.2.3/models/weapon.obj",
"hash": "sha256:def456...",
"size": 1048576
}
}
}
11. Interview Questions & Real-world Scenarios
11.1 System Design Questions
Q1: Design a matchmaking system for 10M concurrent players.
Requirements:
- <3s matchmaking time
- Balanced skill (MMR Β± 100)
- Regional (minimize latency)
- Party support (5-man stack)
Solution:
1. Shard by region (US, EU, Asia independent queues)
2. In-memory pools per region (fast matching)
3. Expand MMR tolerance over time (graceful degradation)
4. Priority queue for long-waiting players
5. Validate latency before match creation
6. Fallback to cross-region if queue too long
Components:
- API Gateway β Route to regional MM workers
- MM Worker (stateful, in-memory queue)
- Redis (track party sessions)
- Game Server Pool (Kubernetes HPA)
Tradeoffs:
- Fast matching vs perfect balance
- Regional isolation vs queue depth
- Skill matching vs wait time
Q2: How to prevent cheating in a competitive FPS game?
Anti-cheat layers:
1. Server Authority:
- All game logic on server (clients send inputs only)
- Validate inputs (speed, fire rate, ammo)
- Lag compensation (rewind time for hit detection)
2. Heuristic Detection:
- Track accuracy, headshot rate, reaction time
- Flag outliers (>90% accuracy, <100ms reactions)
- Shadow ban (match cheaters with cheaters)
3. Client Integrity:
- Code signing (detect modified executables)
- Memory protection (encrypt critical data)
- Kernel-level anti-cheat (Vanguard, Easy Anti-Cheat)
4. Replay Analysis:
- Store full match replays
- ML model for behavior analysis
- Manual review by moderators
5. Community Reports:
- In-game report system
- Crowd-sourced detection (Overwatch system in CS:GO)
Tradeoffs:
- Security vs performance (server-side validation adds latency)
- Privacy vs anti-cheat (kernel drivers are invasive)
- False positives vs false negatives
Q3: Design a global leaderboard that updates in real-time.
Scale: 100M players, update every second
Naive approach: Single DB table, ORDER BY mmr DESC
β Won't scale (query too slow)
Solution:
1. Redis Sorted Sets:
- ZADD leaderboard:global {mmr} {player_id}
- ZREVRANGE (top N) in O(log N + N)
- ZREVRANK (player rank) in O(log N)
2. Sharding by MMR buckets:
- Bucket 0-999, 1000-1999, etc.
- Query top 100: fetch from highest bucket first
- Player rank: count higher buckets + rank in bucket
3. Regional leaderboards:
- leaderboard:us, leaderboard:eu
- Less contention, faster queries
4. Time-based leaderboards:
- leaderboard:daily:2024-04-17
- leaderboard:season:2024-Q1
- Reset at intervals
5. Cache & pre-compute:
- Cache top 100 (refresh every 5s)
- Pre-compute ranks for top 10K only
- On-demand compute for others
6. Pub/Sub for real-time updates:
- Redis pub/sub: leaderboard:updates
- Push to WebSocket clients
Write throughput:
- 1M match ends/hour = 278 updates/s
- Redis can handle 100K+ writes/s
β Easily scalable with sharding
11.2 Debugging Scenarios
Scenario 1: Players complaining about high latency spikes.
Investigation:
1. Check server metrics:
- CPU usage (>80% β slow ticks)
- Memory (swap β latency)
- Network (packet loss, bandwidth saturation)
2. Check game server tick rate:
- Should be stable 64Hz
- If dropping to 30Hz β performance issue
3. Check client logs:
- Network RTT (ping command)
- Packet loss rate
- Jitter (variance in latency)
4. Regional analysis:
- Specific region affected? (routing issue)
- ISP provider pattern? (peering problem)
5. Time-based pattern:
- Peak hours only? (need more capacity)
- Random spikes? (DDoS, network congestion)
Resolution:
- Scale up game servers (increase CPU/memory)
- Add more servers in affected region
- Optimize tick loop (profiling)
- CDN for asset delivery (reduce bandwidth)
- Contact ISP if routing issue
Scenario 2: Leaderboard showing stale data for some players.
Investigation:
1. Check Redis:
- Keys exist? (cache miss β query DB)
- TTL correct? (premature expiration)
- Replication lag? (read from stale replica)
2. Check update pipeline:
- Kafka consumer lag? (event backlog)
- Flink job running? (check job status)
- Match result events published? (producer issue)
3. Check clock skew:
- Server clocks synchronized? (NTP)
- Timestamp ordering issues
4. Race conditions:
- Concurrent updates to same player
- Last-write-wins conflict
Resolution:
- Restart Kafka consumers (clear lag)
- Fix Redis replication (force sync)
- Add idempotency keys (prevent duplicate updates)
- Use Redis transactions (MULTI/EXEC)
Scenario 3: Matchmaking queue time suddenly 10x slower.
Investigation:
1. Check queue depth:
- Redis: LLEN matchmaking:queue:{region}
- Sudden spike? (viral event, streamer)
2. Check MM worker health:
- All workers running?
- CPU/memory usage?
- Deadlock or infinite loop?
3. Check game server availability:
- Enough idle servers?
- K8s HPA triggered?
- Deployment in progress? (reduced capacity)
4. Check match creation rate:
- Throughput dropped?
- Database slow? (query timeout)
- External service down? (auth, profile)
Resolution:
- Scale MM workers horizontally
- Increase game server pool (trigger HPA manually)
- Loosen matching constraints (temp: expand MMR delta)
- Disable non-critical features (analytics)
- Communicate to players (expected wait time)
11.3 Trade-off Questions
Q: Client prediction vs server authority?
Client Prediction:
Pros: Instant feedback, smooth gameplay
Cons: Misprediction β rubber-banding, more complex
Server Authority:
Pros: Hack-proof, simpler logic
Cons: Input lag, feels sluggish
Best practice: Hybrid
- Client predicts own movement (instant)
- Server validates and reconciles
- Lag compensation for hit detection
Q: UDP vs TCP for game networking?
TCP:
Pros: Reliable, ordered
Cons: Head-of-line blocking (1 lost packet β all wait)
UDP:
Pros: Low latency, no head-of-line blocking
Cons: Packet loss, out-of-order delivery
Best practice: UDP + custom reliability
- Use UDP for game state (tolerate loss)
- Reliable channel for critical events (kills, match end)
- Redundancy (send critical data multiple times)
- Modern: QUIC (UDP + streams, no HOL blocking)
Q: Persistent game server vs serverless functions?
Persistent Server:
Pros: Stateful (game state in memory), low latency
Cons: Cost (idle servers), complex orchestration
Serverless (Lambda, Cloud Run):
Pros: No idle cost, auto-scale
Cons: Cold start latency, stateless (need external state store)
Best practice: Persistent for real-time matches
- Serverless for stateless APIs (profile, inventory)
- Persistent game servers (ECS, K8s)
- Hybrid: serverless matchmaking + persistent game servers
12. Production War Stories
12.1 The Great Matchmaking Meltdown
Context: Launch day cα»§a Season 2, 5M players login cΓΉng lΓΊc.
Problem:
- Matchmaking queue timeout sau 30s
- Database deadlocks (concurrent MMR updates)
- Redis OOM (queue khΓ΄ng drain Δược)
Root cause:
- MM workers khΓ΄ng scale Δα»§ nhanh
- DB connection pool exhausted
- Redis memory limit ΔαΊ‘t (evict policy khΓ΄ng ΔΓΊng)
Resolution:
1. Emergency scale: 10x MM workers
2. Increase DB connection pool (100 β 500)
3. Redis maxmemory-policy: allkeys-lru (was noeviction)
4. Disable analytics pipeline (save DB connections)
5. Communicate: "High queue times, we're scaling"
Lessons learned:
- Load test at 2x expected peak, not 1x
- Circuit breakers (degrade features under load)
- Observability (know WHEN to scale)
- Runbook (predefined responses to incidents)
12.2 The Invisible Cheater
Context: Top player vα»i 99% win rate, community phΓ n nΓ n.
Problem:
- Anti-cheat khΓ΄ng flag player nΓ y
- Manual review: gameplay có vẻ hợp lý
- Replay khΓ΄ng thαΊ₯y bαΊ±ng chα»©ng cheat
Investigation:
- Deep dive vΓ o server logs
- Player luΓ΄n win rounds nhα» "luck" (enemy DC)
- Pattern: opponent disconnect 10% matches
Discovery:
- DDoS attack: player sniff opponent IPs β DDoS unggak hα» offline
- Game server khΓ΄ng detect nΓ y (network layer attack)
Resolution:
- Hide player IPs (use relay servers)
- Ban player (violate ToS)
- Implement IP obfuscation (WebRTC TURN servers)
Lessons learned:
- Anti-cheat khΓ΄ng chα» lΓ game logic
- Network-level attacks cαΊ§n network-level defenses
- Privacy = security (hide sensitive info)
12.3 The Leaderboard Apocalypse
Context: End of season, calculate final ranks cho 100M players.
Problem:
- Batch job chαΊ‘y 36 hours (SLA: 2 hours)
- Database locked (millions UPDATE queries)
- Players khΓ΄ng thαΊ₯y rewards
Root cause:
- Sequential processing (1 player at a time)
- N+1 query pattern (fetch stats β update rank)
- No indexing on critical columns
Resolution:
1. Parallel batch processing (Spark job, 1000 workers)
2. Bulk updates (batch 10K players per query)
3. Add index: CREATE INDEX ON players(mmr, season_id)
4. Pre-aggregate stats (no join during rank compute)
Improvement:
- 36 hours β 1.5 hours
- No downtime (read replicas for queries)
Lessons learned:
- Batch operations scale differently than online queries
- Pre-compute where possible (trade storage for speed)
- Test with production data size, not toy data
13. Advanced Topics
13.1 Cross-region Matchmaking
Problem: US player Δợi 5 phΓΊt vΓ¬ khΓ΄ng Δα»§ players β frustration
Solution: Cross-region matching (US β EU)
Constraints:
- Max latency: 150ms (US-West β EU-West ~ 140ms)
- Only if queue time >2 minutes
- Prefer same-region (better experience)
Implementation:
1. Regional priority queue (US players search US first)
2. After 2 min: expand to adjacent regions (US β EU)
3. Select server in middle region (US-East, closer to EU)
4. Weight teams by latency (balance ping advantage)
Trade-off: Match quality (latency) vs queue time
13.2 Skill-based Rating Systems
ELO: Simple, 1v1 games (chess)
Glicko-2: Accounts for rating volatility (period of inactivity)
TrueSkill: Team-based, accounts for individual contribution
TrueSkill (Microsoft):
- ΞΌ (mu): Skill estimate
- Ο (sigma): Uncertainty
- New player: ΞΌ=1500, Ο=500 (very uncertain)
- Veteran: ΞΌ=2000, Ο=50 (confident estimate)
After match: Bayesian update
- Win against lower-rated: small ΞΌ increase, Ο decreases
- Upset win: large ΞΌ increase
Benefits:
- Better for teams (each player has individual skill)
- Handles uncertainty (new players matched conservatively)
13.3 Server Tick Optimization
Tick rate vs CPU usage:
- 16 Hz (60ms): Mobile games, low-action
- 32 Hz (31ms): Casual games
- 64 Hz (15.6ms): Competitive FPS (CS:GO, Valorant)
- 128 Hz (7.8ms): Pro-level CS:GO servers
Optimization techniques:
1. Spatial partitioning (only update nearby entities)
- Grid-based: divide map into cells
- Only check collisions within same/adjacent cells
2. Interest management (send updates only for visible entities)
- Client FOV culling
- Don't send entities behind walls
3. Delta compression (previous section)
4. Multi-threading (physics, AI on separate threads)
- Game loop on main thread (deterministic)
- Physics simulation on worker threads
- Merge results in main thread
5. SIMD (vectorized calculations)
- Process 4 entities at once (SSE, AVX)
Tα»ng kαΊΏt
Gaming backend lΓ mα»t trong nhα»―ng hα» thα»ng phα»©c tαΊ‘p nhαΊ₯t trong software engineering:
- Real-time: <100ms latency, 64Hz tick rate
- Scale: 10M concurrent, billions events/day
- Consistency: Authoritative server, anti-cheat
- Availability: 99.95% uptime (downtime = lost revenue)
Key takeaways:
- Authoritative server - Server is source of truth, validate all client actions
- Client prediction + reconciliation - Balance responsive gameplay vs server authority
- Matchmaking - Skill + latency, graceful degradation over time
- Leaderboard - Redis sorted sets, shard by MMR buckets
- Anti-cheat - Multi-layered (server validation, heuristics, client integrity)
- Scalability - Dynamic game server allocation (K8s HPA), regional sharding
- Analytics - Kafka + Flink for real-time event processing
- Optimization - Delta compression, tick optimization, spatial partitioning
Real-world trade-offs:
- Fast matching vs balanced teams
- Client prediction vs server authority
- Security vs performance
- Skill matching vs queue time
Gaming backend yΓͺu cαΊ§u sα»± kαΊΏt hợp giα»―a distributed systems, real-time systems, networking, vΓ game design. Mα»i quyαΊΏt Δα»nh kα»Ή thuαΊt αΊ£nh hΖ°α»ng trα»±c tiαΊΏp ΔαΊΏn player experience β mα»t frame drop hay lag spike cΓ³ thα» khiαΊΏn player rage quit.
Remember: At the end of the day, make games fun, not just technically impressive. Performance is a feature, but gameplay is king.