📦 Cache✍️ Khoa📅 19/04/2026☕ 5 phút đọc

Cache: Tổng quan và Lộ trình học

Cache là tầng trung gian lưu data đã tính toán hoặc đã fetch, giúp trả lời requests tiếp theo nhanh hơn nhiều lần — đổi không gian (memory) lấy thời gian (latency). Đây là một trong những kỹ thuật có ROI cao nhất trong engineering: thêm Redis đúng chỗ có thể giảm p99 latency từ 500ms xuống 5ms.

flowchart LR
  subgraph KHONG_CO_CACHE["Không có cache"]
    C1["Client"] --> A1["API Server"]
    A1 -->|SQL query| D1["Database (10-100ms)"]
    D1 --> A1
    A1 --> R1["Response (p99 ≈ 200ms)"]
  end

  subgraph CO_CACHE["Có cache"]
    C2["Client"] --> A2["API Server"]
    A2 --> H{"Cache hit?"}
    H -->|YES| R2["Response (p99 ≈ 5ms)"]
    H -->|NO| RDS["Redis (1ms)"]
    RDS -->|MISS| D2["Database (10-100ms)"]
    D2 --> SET["Redis.SET (async)"]
    SET --> R2
  end

Không có cache:                    Có cache:

Client                             Client
  │                                  │
  ▼                                  ▼
API Server                         API Server
  │                                  │
  │ SQL query (10-100ms)             │ cache hit? ──YES──► trả về (~1ms)
  ▼                                  │
Database                             │ NO
  │                                  ▼
  │ return data                    Redis (~1ms)
  ▼                                  │ miss
API Server                           ▼
  │                                Database (10-100ms)
  ▼                                  │
Client (p99 = 200ms)                 ▼
                                   Redis.SET (async)
                                     │
                                     ▼
                                   Client (p99 = 5ms)

Tại sao Cache phức tạp hơn bạn nghĩ?

Cache trông đơn giản: "lưu vào Redis, lấy từ Redis". Nhưng thực tế có hàng chục quyết định thiết kế phải đưa ra:

Khi nào invalidate? TTL bao nhiêu? Event-driven hay time-based?
Write strategy nào? Cache-aside, write-through hay write-behind?
Thundering herd? 10.000 requests cùng lúc cache miss — sẽ làm sập DB?
Cluster hay Sentinel? Và khi nào cần sharding cache?
Memory hết thì sao? LRU, LFU, hay TTL eviction?

Section này đi sâu vào tất cả những câu hỏi đó.

Nội dung

File	Nội dung	Level
strategies-and-patterns.md	Cache-aside, Read-through, Write-through, Write-behind, Write-around	Intermediate
redis-internals.md	Data structures, persistence, cluster, eviction policies	Intermediate–Advanced
invalidation-and-consistency.md	Cache stampede, dog-pile, TTL strategy, stale-while-revalidate	Advanced
interview-and-big-picture.md	System design cache questions, leaderboard, session, warmup	Interview

Kiến thức nền cần có

Trước khi đọc section này, nên có:

Hiểu cơ bản về TCP/IP và HTTP (latency, connection overhead)
Biết Database indexing là gì (để hiểu tại sao DB query chậm)
Biết Go cơ bản (code examples dùng Go)

Mental Model: Cache là gì?

Cache = Lớp bộ nhớ nhanh nằm giữa consumer và source of truth

Tốc độ tham khảo (order of magnitude):
  CPU register:    0.3 ns
  L1 cache:        1   ns
  L2 cache:        4   ns
  L3 cache:        10  ns
  RAM:             100 ns
  Redis (network): 0.5 ms    (~500,000 ns)
  SSD random read: 0.1 ms
  DB query:        1-100 ms
  Network call:    10-500 ms

→ Redis cách RAM 5,000x chậm hơn
→ Redis cách DB query 100-10,000x nhanh hơn
→ Trade-off: thêm network hop nhưng loại bỏ DB computation

Cache hoạt động tốt khi data có tính chất:

Read-heavy: đọc nhiều hơn ghi (ratio 10:1 trở lên)
Expensive to compute: query phức tạp, join nhiều bảng
Tolerable staleness: data không cần fresh 100%
Temporal locality: data vừa đọc có khả năng đọc lại

Metrics quan trọng cần nắm

Cache Hit Rate = Hits / (Hits + Misses)

Ví dụ thực tế:
  Hit rate 90% → 10% requests phải hit DB
  Hit rate 99% → 1%  requests phải hit DB

  Với 100,000 req/s:
    90% hit rate → 10,000 req/s xuống DB  ← DB có thể chịu không?
    99% hit rate →  1,000 req/s xuống DB  ← an toàn hơn nhiều

Mục tiêu thực tế:
  Production cache: >= 95% hit rate
  CDN edge cache:   >= 99% hit rate
  In-process cache: >= 99.9% hit rate

💡 Interview: "Cache hit rate của bạn là bao nhiêu thì đủ?" — Câu trả lời phụ thuộc vào DB capacity. Nếu DB handle được 100% traffic thì cache chỉ cần giảm latency, không cần hit rate cao. Nhưng nếu DB là bottleneck thì hit rate là sống còn.

Các loại Cache theo vị trí

flowchart TB
  B["Browser Cache"] --> C["CDN / Edge Cache"]
  C --> G["API Gateway Cache"]
  G --> I["In-process Cache"]
  I --> D["Distributed Cache"]
  D --> DB["Database Buffer Pool"]
  DB --> OS["OS Page Cache"]
  OS --> S["Storage (SSD/HDD)"]

Browser Cache
    │  (HTTP Cache-Control, ETag)
    ▼
CDN / Edge Cache         ← Cloudflare, Fastly, AWS CloudFront
    │  (geographic proximity)
    ▼
API Gateway Cache        ← Kong, nginx, Apigee
    │
    ▼
In-process Cache         ← sync.Map, bigcache, go-cache (trong app)
    │  (fastest, but per-instance — không shared)
    ▼
Distributed Cache        ← Redis, Memcached (shared across instances)
    │
    ▼
Database Buffer Pool     ← InnoDB buffer pool, PostgreSQL shared_buffers
    │  (DB's own cache — quản lý tự động)
    ▼
OS Page Cache            ← kernel manages file system cache
    │
    ▼
Storage (SSD/HDD)

Mỗi tầng có trade-off riêng: càng gần application càng nhanh nhưng khó chia sẻ giữa nhiều instances.

Roadmap học Cache

flowchart TB
  R["Roadmap học Cache"] --> F["1) Nền tảng"]
  F --> F1["strategies-and-patterns.md"]
  F --> F2["redis-internals.md"]
  R --> A["2) Nâng cao"]
  A --> A1["invalidation-and-consistency.md"]
  R --> I["3) Interview prep"]
  I --> I1["interview-and-big-picture.md"]

1. Nền tảng (đọc trước)
   ├── strategies-and-patterns.md   ← cache patterns cốt lõi
   └── redis-internals.md           ← hiểu Redis là gì

2. Nâng cao
   └── invalidation-and-consistency.md  ← phần khó nhất

3. Interview prep
   └── interview-and-big-picture.md     ← system design cache questions

Liên quan đến các section khác

db/: Database là source of truth — cache protect DB khỏi quá tải
network/: HTTP caching headers (Cache-Control, ETag, Vary)
software/system-design/: Rate limiting thường dùng Redis; distributed locking
software/distributed-systems/: Consistency models áp dụng cho cache coherence