Domain: Social Network
Facebook có 3 tỷ users. TikTok serve 1 tỷ videos mỗi ngày. Twitter handle 500 triệu tweets/ngày. Đằng sau những con số này là các bài toán kỹ thuật cực kỳ phức tạp: scale read-heavy workload, distribute feed generation, moderate content real-time, và đảm bảo user không thấy stale data.
Section này mô tả cách thiết kế social network ở scale — không phải CRUD toy project.
1. Data Model & Schema Design
1.1 Core Entities
-- Users
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
display_name VARCHAR(100),
bio TEXT,
avatar_url TEXT,
follower_count INT DEFAULT 0,
following_count INT DEFAULT 0,
post_count INT DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_active_at TIMESTAMPTZ,
is_verified BOOLEAN DEFAULT false,
is_private BOOLEAN DEFAULT false
);
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_last_active ON users(last_active_at DESC);
-- Relationships (follows)
CREATE TABLE relationships (
id BIGSERIAL PRIMARY KEY,
follower_id BIGINT NOT NULL REFERENCES users(id),
following_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(follower_id, following_id),
CHECK(follower_id != following_id)
);
CREATE INDEX idx_relationships_follower ON relationships(follower_id);
CREATE INDEX idx_relationships_following ON relationships(following_id);
-- Posts
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
content TEXT,
media_urls TEXT[], -- Array of image/video URLs
like_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
share_count INT DEFAULT 0,
view_count BIGINT DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
is_deleted BOOLEAN DEFAULT false
);
CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC) WHERE is_deleted = false;
CREATE INDEX idx_posts_created ON posts(created_at DESC) WHERE is_deleted = false;
-- Likes
CREATE TABLE likes (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
post_id BIGINT NOT NULL REFERENCES posts(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(user_id, post_id)
);
CREATE INDEX idx_likes_post ON likes(post_id, created_at DESC);
CREATE INDEX idx_likes_user ON likes(user_id, created_at DESC);
-- Comments
CREATE TABLE comments (
id BIGSERIAL PRIMARY KEY,
post_id BIGINT NOT NULL REFERENCES posts(id),
user_id BIGINT NOT NULL REFERENCES users(id),
parent_id BIGINT REFERENCES comments(id), -- for nested replies
content TEXT NOT NULL,
like_count INT DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
is_deleted BOOLEAN DEFAULT false
);
CREATE INDEX idx_comments_post ON comments(post_id, created_at DESC) WHERE is_deleted = false;
CREATE INDEX idx_comments_parent ON comments(parent_id, created_at DESC) WHERE parent_id IS NOT NULL;
1.2 Sharding Strategy
Khi có billions users, single DB không đủ.
Sharding by user_id:
- User 1-1M → Shard 1
- User 1M-2M → Shard 2
- ...
Posts sharding:
- Shard by post.user_id (co-locate user's posts với user data)
- Trade-off: Cross-shard queries cho feed (user follows people across shards)
Relationships sharding:
- Shard by follower_id (fan-out writes khi post)
- Hoặc shard by following_id (fan-out reads khi generate feed)
- Hybrid: Dual writes (trade consistency for read performance)
2. Feed Generation — Fan-out Approaches
2.1 Fan-out on Write (Push model)
Khi Alice post:
1. Write post to posts table
2. Fan-out: Write post_id vào feed của mọi follower
→ Redis list "feed:user:{follower_id}" lpush post_id
3. Follower Bob mở app → fetch từ Redis feed (instant, no computation)
Ưu điểm:
- Read fast (pre-computed)
- Scale reads (common case: users scroll feed nhiều hơn post)
Nhược điểm:
- Expensive khi celebrity post (100M followers → 100M writes)
- Stale data (follower unfollows nhưng feed vẫn còn old posts)
func FanOutPost(ctx context.Context, rdb *redis.Client, db *sql.DB, post Post) error {
// Fetch tất cả followers
rows, err := db.Query(`
SELECT follower_id FROM relationships WHERE following_id = $1
`, post.UserID)
if err != nil {
return err
}
defer rows.Close()
followerIDs := []int64{}
for rows.Next() {
var followerID int64
rows.Scan(&followerID)
followerIDs = append(followerIDs, followerID)
}
// Fan-out to Redis feeds
pipe := rdb.Pipeline()
for _, followerID := range followerIDs {
feedKey := fmt.Sprintf("feed:user:%d", followerID)
pipe.LPush(ctx, feedKey, post.ID)
pipe.LTrim(ctx, feedKey, 0, 999) // Keep latest 1000 posts
pipe.Expire(ctx, feedKey, 7*24*time.Hour) // TTL 7 days
}
_, err = pipe.Exec(ctx)
return err
}
func GetFeed(ctx context.Context, rdb *redis.Client, userID int64, limit int) ([]int64, error) {
feedKey := fmt.Sprintf("feed:user:%d", userID)
postIDs, err := rdb.LRange(ctx, feedKey, 0, int64(limit-1)).Result()
if err != nil {
return nil, err
}
result := []int64{}
for _, idStr := range postIDs {
id, _ := strconv.ParseInt(idStr, 10, 64)
result = append(result, id)
}
return result, nil
}
2.2 Fan-out on Read (Pull model)
Khi Bob mở feed:
1. Query DB: SELECT posts WHERE user_id IN (Alice's followings) ORDER BY created_at DESC
2. Merge-sort results
3. Return to user
Ưu điểm:
- No fan-out cost khi celebrity post
- Always fresh data
Nhược điểm:
- Slow reads (query + merge-sort mỗi lần load feed)
- DB load cao khi millions users scroll feed concurrently
-- Simplified feed query (pull model)
WITH following_users AS (
SELECT following_id FROM relationships WHERE follower_id = $1
)
SELECT p.id, p.user_id, p.content, p.created_at, u.username, u.avatar_url
FROM posts p
JOIN users u ON p.user_id = u.id
WHERE p.user_id IN (SELECT following_id FROM following_users)
AND p.is_deleted = false
ORDER BY p.created_at DESC
LIMIT 20;
Optimization: Shard query across DB replicas, cache recent posts per user.
2.3 Hybrid Approach (Twitter/Facebook)
Fan-out on write cho normal users (<10k followers)
Fan-out on read cho celebrities (>10k followers)
Khi Bob scroll feed:
1. Fetch pre-computed feed từ Redis (normal users)
2. Query latest posts từ celebrities Bob follows
3. Merge-sort cả 2 sources
4. Apply ranking/ML model
5. Return feed
Trade-off balance:
- Write cost giảm (không fan-out cho celebrity)
- Read cost tăng nhẹ (query + merge)
- User experience tốt (feed vẫn fresh)
3. Ranking Algorithm — EdgeRank & ML
3.1 Simple Time-based
Feed = Posts FROM followings ORDER BY created_at DESC
Problem: Miss important posts from days ago (e.g., friend's wedding photo)
3.2 EdgeRank (Facebook's classic algorithm)
Score = Affinity × Weight × Time Decay
Affinity: Mức độ user interact với post author
- Lượt like/comment lịch sử
- Frequency of interaction
- 0.0 - 1.0
Weight: Post type weight
- Video: 1.0
- Photo: 0.8
- Link: 0.5
- Text: 0.3
Time Decay: e^(-λ × hours_since_post)
- Recent posts: decay = 1.0
- 24h ago: decay = 0.5
- 7d ago: decay = 0.1
Final Score = Affinity × Weight × Time Decay
Rank feed by score DESC
import math
from datetime import datetime, timedelta
def calculate_edge_rank(post, user_id):
# Affinity: từ user interaction history
affinity = get_affinity(user_id, post.author_id) # 0.0 - 1.0
# Weight
if post.media_type == 'video':
weight = 1.0
elif post.media_type == 'photo':
weight = 0.8
elif post.media_type == 'link':
weight = 0.5
else:
weight = 0.3
# Time decay
hours_ago = (datetime.now() - post.created_at).total_seconds() / 3600
time_decay = math.exp(-0.1 * hours_ago) # lambda = 0.1
return affinity * weight * time_decay
def rank_feed(posts, user_id):
scored_posts = [(p, calculate_edge_rank(p, user_id)) for p in posts]
scored_posts.sort(key=lambda x: x[1], reverse=True)
return [p for p, _ in scored_posts]
3.3 ML-based Ranking (Modern approach)
Features:
User features:
- Demographics (age, location)
- Historical engagement rate
- Session time, frequency
- Device type
Post features:
- Media type, length
- Author's follower count, engagement rate
- Post creation time
- Content embeddings (BERT for text, ResNet for images)
Contextual features:
- User-author affinity
- Time of day, day of week
- User's recent interactions
Model: Two-stage ranking
Stage 1: Candidate Generation (retrieve top 1000 from millions)
→ Collaborative filtering, matrix factorization
Stage 2: Ranking (re-rank top 1000)
→ Gradient Boosting (LightGBM) hoặc Deep Learning (DNN)
Objective: Predict P(engage | user, post)
Engage = like, comment, share, click, dwell time
import lightgbm as lgb
import pandas as pd
# Feature engineering
def extract_features(user, post, context):
return {
# User features
'user_age': user.age,
'user_engagement_rate': user.avg_engagement_rate,
'user_follower_count': user.follower_count,
# Post features
'post_media_type': encode_media_type(post.media_type),
'post_author_follower_count': post.author.follower_count,
'post_like_count': post.like_count,
'post_age_hours': (datetime.now() - post.created_at).total_seconds() / 3600,
# Interaction features
'user_author_affinity': context.get_affinity(user.id, post.author_id),
'user_author_past_likes': context.count_past_likes(user.id, post.author_id),
# Contextual
'hour_of_day': datetime.now().hour,
'is_weekend': 1 if datetime.now().weekday() >= 5 else 0,
}
# Training
df_train = pd.read_parquet('s3://feeds/training_data.parquet')
X = df_train.drop(columns=['engaged'])
y = df_train['engaged'] # 1 if user engaged (like/comment/share), 0 otherwise
model = lgb.LGBMRanker(
objective='lambdarank',
metric='ndcg',
boosting_type='gbdt',
num_leaves=64,
learning_rate=0.05,
n_estimators=500
)
model.fit(X, y, group=df_train.groupby('user_id').size().values)
# Inference
def rank_posts_ml(user, candidate_posts):
features = [extract_features(user, post, context) for post in candidate_posts]
scores = model.predict(pd.DataFrame(features))
ranked = sorted(zip(candidate_posts, scores), key=lambda x: x[1], reverse=True)
return [post for post, _ in ranked]
4. Notifications — Real-time Push
4.1 Notification Types
1. Engagement: "Alice liked your post"
2. Social: "Bob started following you"
3. Mentions: "@user tagged you in a comment"
4. System: "Your post violates community guidelines"
4.2 Delivery Mechanism
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Action │ │ Kafka Topic │ │ Notification │
│ (like post) ├─────────►│ "events" ├─────────►│ Service │
└─────────────┘ └──────────────┘ └───────┬──────┘
│
┌─────────────────────────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ WebSock │ │ FCM │ │ APNs │
│ (Web) │ │(Android) │ │ (iOS) │
└──────────┘ └──────────┘ └──────────┘
type NotificationService struct {
kafka *kafka.Producer
websocket *WebSocketHub
fcm *FCMClient
apns *APNSClient
}
func (s *NotificationService) NotifyLike(likerID, postAuthorID, postID int64) error {
// Bước 1: Create notification record
notif := &Notification{
Type: "like",
RecipientID: postAuthorID,
ActorID: likerID,
PostID: postID,
CreatedAt: time.Now(),
}
// Bước 2: Save to DB
if err := s.db.Insert(notif); err != nil {
return err
}
// Bước 3: Real-time delivery
// WebSocket (nếu user online)
if s.websocket.IsOnline(postAuthorID) {
s.websocket.Send(postAuthorID, notif)
}
// Push notification (mobile)
devices, _ := s.db.GetUserDevices(postAuthorID)
for _, device := range devices {
switch device.Platform {
case "android":
s.fcm.Send(device.Token, notif)
case "ios":
s.apns.Send(device.Token, notif)
}
}
return nil
}
4.3 Notification Aggregation
Thay vì:
"Alice liked your post"
"Bob liked your post"
"Charlie liked your post"
(3 notifications)
Aggregate thành:
"Alice, Bob, Charlie and 47 others liked your post"
(1 notification)
Implementation: Batch notifications trong 5 phút window
from collections import defaultdict
import time
class NotificationAggregator:
def __init__(self):
self.buffer = defaultdict(list) # {(type, post_id): [actor_ids]}
self.last_flush = time.time()
def add(self, notif_type, post_id, actor_id):
key = (notif_type, post_id)
self.buffer[key].append(actor_id)
# Flush every 5 minutes
if time.time() - self.last_flush > 300:
self.flush()
def flush(self):
for (notif_type, post_id), actor_ids in self.buffer.items():
if len(actor_ids) == 1:
text = f"{actor_ids[0]} {notif_type}d your post"
elif len(actor_ids) <= 3:
names = ", ".join(actor_ids)
text = f"{names} {notif_type}d your post"
else:
first_three = ", ".join(actor_ids[:3])
others = len(actor_ids) - 3
text = f"{first_three} and {others} others {notif_type}d your post"
# Send aggregated notification
self.send_notification(post_id, text)
self.buffer.clear()
self.last_flush = time.time()
5. Content Moderation
5.1 Multi-layer Approach
Layer 1: Automated filters (pre-publish)
- Text: profanity filter, hate speech detector (ML model)
- Image: NSFW classifier (ResNet-based)
- Video: thumbnail-based check
Layer 2: User reports
- Crowdsourced moderation
- Report → review queue
Layer 3: Human moderators
- Review flagged content
- Final decision: approve, delete, ban user
Layer 4: Appeals
- User can appeal ban
- Senior moderator review
5.2 NSFW Image Detection
from transformers import pipeline
# Pre-trained NSFW classifier
classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")
def is_safe(image_url):
result = classifier(image_url)
# result = [{'label': 'nsfw', 'score': 0.95}, {'label': 'normal', 'score': 0.05}]
nsfw_score = next((r['score'] for r in result if r['label'] == 'nsfw'), 0)
return nsfw_score < 0.8 # threshold
# Integration
def upload_image(user_id, image_file):
# Upload to CDN
url = cdn.upload(image_file)
# Check NSFW
if not is_safe(url):
# Auto-delete or flag for review
moderation_queue.add(url, reason="nsfw")
return {"error": "Image contains inappropriate content"}
return {"url": url}
5.3 Hate Speech Detection
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("unitary/toxic-bert")
model = AutoModelForSequenceClassification.from_pretrained("unitary/toxic-bert")
def detect_toxicity(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
scores = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Classes: toxic, severe_toxic, obscene, threat, insult, identity_hate
toxic_score = scores[0][0].item()
return toxic_score > 0.7 # Binary: toxic or not
# Pre-publish filter
def create_post(user_id, content):
if detect_toxicity(content):
return {"error": "Post contains inappropriate language"}
# Proceed to publish
post = db.insert_post(user_id, content)
return {"post_id": post.id}
6. Distributed Counters (Likes, Followers)
6.1 Vấn đề
Khi post viral có 1M likes/minute:
UPDATE posts SET like_count = like_count + 1 WHERE id = 123
→ Row-level lock contention
→ Throughput giảm xuống ~100 updates/second
6.2 Eventual Consistency Counter
Redis INCR + Periodic flush to DB
User likes post:
1. INCR redis:post:123:like_count
2. Background job (every 10s): Flush Redis → PostgreSQL
func LikePost(ctx context.Context, rdb *redis.Client, userID, postID int64) error {
// Bước 1: Idempotency check
likeKey := fmt.Sprintf("like:%d:%d", userID, postID)
exists, _ := rdb.Exists(ctx, likeKey).Result()
if exists > 0 {
return ErrAlreadyLiked
}
// Bước 2: Increment counter in Redis
counterKey := fmt.Sprintf("post:%d:like_count", postID)
rdb.Incr(ctx, counterKey)
// Bước 3: Mark user has liked (with expiry)
rdb.Set(ctx, likeKey, 1, 30*24*time.Hour)
// Bước 4: Insert to DB (eventual)
// Can be async via message queue
db.InsertLike(userID, postID)
return nil
}
// Background job: Sync Redis counters → DB
func SyncCounters(ctx context.Context, rdb *redis.Client, db *sql.DB) {
pattern := "post:*:like_count"
iter := rdb.Scan(ctx, 0, pattern, 100).Iterator()
for iter.Next(ctx) {
key := iter.Val()
// key = "post:123:like_count"
count, _ := rdb.Get(ctx, key).Int64()
postID := extractPostID(key)
// Update DB
db.Exec(`UPDATE posts SET like_count = $1 WHERE id = $2`, count, postID)
// Optional: Delete Redis key sau khi sync (hoặc giữ cache)
}
}
6.3 Approximate Counter with Sharding
Thay vì 1 Redis key, shard thành N shards:
post:123:like_count:shard0
post:123:like_count:shard1
...
post:123:like_count:shard9
User like → hash(user_id) % 10 → shard_id
Increment shard tương ứng
Total count = SUM(all shards)
Trade-off:
- Reduce contention (10x parallel writes)
- Slightly more complex to read (multi-get)
7. Graph Data — Friends & Mutual Friends
7.1 Bidirectional Friendship (Facebook)
CREATE TABLE friendships (
user_id_1 BIGINT NOT NULL,
user_id_2 BIGINT NOT NULL,
status VARCHAR(20), -- PENDING, ACCEPTED, BLOCKED
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (user_id_1, user_id_2),
CHECK (user_id_1 < user_id_2) -- Enforce canonical order
);
-- Find friends
SELECT user_id_2 AS friend_id FROM friendships
WHERE user_id_1 = $1 AND status = 'ACCEPTED'
UNION
SELECT user_id_1 AS friend_id FROM friendships
WHERE user_id_2 = $1 AND status = 'ACCEPTED';
7.2 Mutual Friends Query
-- Mutual friends giữa user 123 và user 456
WITH user1_friends AS (
SELECT user_id_2 AS friend_id FROM friendships
WHERE user_id_1 = 123 AND status = 'ACCEPTED'
UNION
SELECT user_id_1 FROM friendships
WHERE user_id_2 = 123 AND status = 'ACCEPTED'
),
user2_friends AS (
SELECT user_id_2 AS friend_id FROM friendships
WHERE user_id_1 = 456 AND status = 'ACCEPTED'
UNION
SELECT user_id_1 FROM friendships
WHERE user_id_2 = 456 AND status = 'ACCEPTED'
)
SELECT friend_id FROM user1_friends
INTERSECT
SELECT friend_id FROM user2_friends;
Optimization: Cache friend lists in Redis (set), use SINTER for mutual friends.
func GetMutualFriends(ctx context.Context, rdb *redis.Client, user1, user2 int64) ([]int64, error) {
key1 := fmt.Sprintf("friends:%d", user1)
key2 := fmt.Sprintf("friends:%d", user2)
// Redis set intersection
mutualIDs, err := rdb.SInter(ctx, key1, key2).Result()
if err != nil {
return nil, err
}
result := []int64{}
for _, idStr := range mutualIDs {
id, _ := strconv.ParseInt(idStr, 10, 64)
result = append(result, id)
}
return result, nil
}
8. Search — User & Post Discovery
8.1 Elasticsearch for User Search
// Index mapping
{
"mappings": {
"properties": {
"username": {"type": "text", "analyzer": "standard"},
"display_name": {"type": "text", "analyzer": "standard"},
"bio": {"type": "text"},
"follower_count": {"type": "integer"},
"is_verified": {"type": "boolean"}
}
}
}
// Search query với fuzzy matching
{
"query": {
"bool": {
"should": [
{"match": {"username": {"query": "john", "boost": 3, "fuzziness": "AUTO"}}},
{"match": {"display_name": {"query": "john", "boost": 2}}},
{"match": {"bio": {"query": "john", "boost": 1}}}
],
"filter": [
{"term": {"is_verified": true}}
]
}
},
"sort": [
{"follower_count": {"order": "desc"}}
]
}
8.2 Autocomplete với Prefix Matching
Trie data structure hoặc Elasticsearch completion suggester
Input: "joh"
Suggestions: john_doe, johnsmith, johnny_depp
// Elasticsearch completion suggester
{
"mappings": {
"properties": {
"username_suggest": {
"type": "completion",
"analyzer": "simple"
}
}
}
}
// Query
{
"suggest": {
"username-suggest": {
"prefix": "joh",
"completion": {
"field": "username_suggest",
"size": 10,
"fuzzy": {"fuzziness": 1}
}
}
}
}
9. Privacy & Permissions
9.1 Post Visibility
Public: Ai cũng thấy
Friends: Chỉ friends thấy
Private: Chỉ mình thấy (draft)
Custom: Specific list of users
Implementation: Check permission before serve post
func CanViewPost(db *sql.DB, viewerID, postID int64) (bool, error) {
var authorID int64
var visibility string
err := db.QueryRow(`
SELECT user_id, visibility FROM posts WHERE id = $1
`, postID).Scan(&authorID, &visibility)
if err != nil {
return false, err
}
switch visibility {
case "public":
return true, nil
case "private":
return viewerID == authorID, nil
case "friends":
return areFriends(db, viewerID, authorID)
case "custom":
return isInCustomList(db, postID, viewerID)
default:
return false, nil
}
}
9.2 Blocking Users
CREATE TABLE blocks (
blocker_id BIGINT NOT NULL REFERENCES users(id),
blocked_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (blocker_id, blocked_id)
);
-- Filter blocked users khỏi feed
SELECT p.* FROM posts p
WHERE p.user_id NOT IN (
SELECT blocked_id FROM blocks WHERE blocker_id = $1
)
AND p.user_id NOT IN (
SELECT blocker_id FROM blocks WHERE blocked_id = $1
);
10. Interview Questions
Q: Thiết kế Twitter feed cho 500M users?
Components:
- User service (shard by user_id)
- Post service (shard by user_id)
- Timeline service (hybrid fan-out)
- Notification service (Kafka + push)
- CDN for media
Trade-offs:
- Fan-out on write vs read (hybrid)
- Eventual consistency for counters
- Caching strategy (Redis for hot data)
Q: Làm sao prevent fake followers/likes?
Answer:
- Rate limiting: Max 100 follows/hour
- CAPTCHA for suspicious behavior
- Machine learning: Detect bot patterns (creation time, activity pattern)
- Phone verification for new accounts
- Shadowban suspected bots
Q: Tối ưu mutual friends query khi mỗi user có 5000 friends?
Answer:
- Cache friend lists in Redis (set)
- Use SINTER for intersection (O(N) với N = min(set1, set2))
- Pre-compute mutual friends cho suggested friends
- Limit to top 100 mutual (pagination)
Tóm tắt
Social network = Read-heavy, relationship-centric, real-time system.
Key challenges:
- Scale reads: Billions feed requests/day → cache, CDN, read replicas
- Fan-out: Celebrity với 100M followers → hybrid approach
- Real-time: Notifications, chat, live updates → WebSocket, Kafka
- Graph queries: Mutual friends, shortest path → graph DB hoặc cache
- Content moderation: Hate speech, NSFW → ML models + human review
Patterns quan trọng:
- Hybrid fan-out (write for normal, read for celebrity)
- Eventual consistency counters (Redis INCR + periodic sync)
- ML-based ranking (two-stage: candidate generation + ranking)
- Notification aggregation (reduce noise)
Tài liệu tham khảo
- Facebook Engineering: News Feed Ranking
- Twitter: The Infrastructure Behind Twitter
- Instagram Engineering: Sharding & IDs
- Designing Data-Intensive Applications — Martin Kleppmann (Chapter 10, 11)