🏭 Domains✍️ Khoa📅 19/04/2026☕ 16 phút đọc

Domain: Social Network

Facebook có 3 tỷ users. TikTok serve 1 tỷ videos mỗi ngày. Twitter handle 500 triệu tweets/ngày. Đằng sau những con số này là các bài toán kỹ thuật cực kỳ phức tạp: scale read-heavy workload, distribute feed generation, moderate content real-time, và đảm bảo user không thấy stale data.

Section này mô tả cách thiết kế social network ở scale — không phải CRUD toy project.


1. Data Model & Schema Design

1.1 Core Entities

-- Users
CREATE TABLE users (
    id              BIGSERIAL PRIMARY KEY,
    username        VARCHAR(50) UNIQUE NOT NULL,
    email           VARCHAR(255) UNIQUE NOT NULL,
    display_name    VARCHAR(100),
    bio             TEXT,
    avatar_url      TEXT,
    follower_count  INT DEFAULT 0,
    following_count INT DEFAULT 0,
    post_count      INT DEFAULT 0,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    last_active_at  TIMESTAMPTZ,
    is_verified     BOOLEAN DEFAULT false,
    is_private      BOOLEAN DEFAULT false
);

CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_last_active ON users(last_active_at DESC);

-- Relationships (follows)
CREATE TABLE relationships (
    id              BIGSERIAL PRIMARY KEY,
    follower_id     BIGINT NOT NULL REFERENCES users(id),
    following_id    BIGINT NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(follower_id, following_id),
    CHECK(follower_id != following_id)
);

CREATE INDEX idx_relationships_follower ON relationships(follower_id);
CREATE INDEX idx_relationships_following ON relationships(following_id);

-- Posts
CREATE TABLE posts (
    id              BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL REFERENCES users(id),
    content         TEXT,
    media_urls      TEXT[],                     -- Array of image/video URLs
    like_count      INT DEFAULT 0,
    comment_count   INT DEFAULT 0,
    share_count     INT DEFAULT 0,
    view_count      BIGINT DEFAULT 0,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    is_deleted      BOOLEAN DEFAULT false
);

CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC) WHERE is_deleted = false;
CREATE INDEX idx_posts_created ON posts(created_at DESC) WHERE is_deleted = false;

-- Likes
CREATE TABLE likes (
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT NOT NULL REFERENCES users(id),
    post_id     BIGINT NOT NULL REFERENCES posts(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(user_id, post_id)
);

CREATE INDEX idx_likes_post ON likes(post_id, created_at DESC);
CREATE INDEX idx_likes_user ON likes(user_id, created_at DESC);

-- Comments
CREATE TABLE comments (
    id              BIGSERIAL PRIMARY KEY,
    post_id         BIGINT NOT NULL REFERENCES posts(id),
    user_id         BIGINT NOT NULL REFERENCES users(id),
    parent_id       BIGINT REFERENCES comments(id),  -- for nested replies
    content         TEXT NOT NULL,
    like_count      INT DEFAULT 0,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    is_deleted      BOOLEAN DEFAULT false
);

CREATE INDEX idx_comments_post ON comments(post_id, created_at DESC) WHERE is_deleted = false;
CREATE INDEX idx_comments_parent ON comments(parent_id, created_at DESC) WHERE parent_id IS NOT NULL;

1.2 Sharding Strategy

Khi có billions users, single DB không đủ.

Sharding by user_id:
  - User 1-1M    → Shard 1
  - User 1M-2M   → Shard 2
  - ...

Posts sharding:
  - Shard by post.user_id (co-locate user's posts với user data)
  - Trade-off: Cross-shard queries cho feed (user follows people across shards)

Relationships sharding:
  - Shard by follower_id (fan-out writes khi post)
  - Hoặc shard by following_id (fan-out reads khi generate feed)
  - Hybrid: Dual writes (trade consistency for read performance)

2. Feed Generation — Fan-out Approaches

2.1 Fan-out on Write (Push model)

Khi Alice post:
  1. Write post to posts table
  2. Fan-out: Write post_id vào feed của mọi follower
     → Redis list "feed:user:{follower_id}" lpush post_id
  3. Follower Bob mở app → fetch từ Redis feed (instant, no computation)

Ưu điểm:
  - Read fast (pre-computed)
  - Scale reads (common case: users scroll feed nhiều hơn post)

Nhược điểm:
  - Expensive khi celebrity post (100M followers → 100M writes)
  - Stale data (follower unfollows nhưng feed vẫn còn old posts)
func FanOutPost(ctx context.Context, rdb *redis.Client, db *sql.DB, post Post) error {
    // Fetch tất cả followers
    rows, err := db.Query(`
        SELECT follower_id FROM relationships WHERE following_id = $1
    `, post.UserID)
    if err != nil {
        return err
    }
    defer rows.Close()
    
    followerIDs := []int64{}
    for rows.Next() {
        var followerID int64
        rows.Scan(&followerID)
        followerIDs = append(followerIDs, followerID)
    }
    
    // Fan-out to Redis feeds
    pipe := rdb.Pipeline()
    for _, followerID := range followerIDs {
        feedKey := fmt.Sprintf("feed:user:%d", followerID)
        pipe.LPush(ctx, feedKey, post.ID)
        pipe.LTrim(ctx, feedKey, 0, 999)  // Keep latest 1000 posts
        pipe.Expire(ctx, feedKey, 7*24*time.Hour)  // TTL 7 days
    }
    _, err = pipe.Exec(ctx)
    return err
}

func GetFeed(ctx context.Context, rdb *redis.Client, userID int64, limit int) ([]int64, error) {
    feedKey := fmt.Sprintf("feed:user:%d", userID)
    postIDs, err := rdb.LRange(ctx, feedKey, 0, int64(limit-1)).Result()
    if err != nil {
        return nil, err
    }
    
    result := []int64{}
    for _, idStr := range postIDs {
        id, _ := strconv.ParseInt(idStr, 10, 64)
        result = append(result, id)
    }
    return result, nil
}

2.2 Fan-out on Read (Pull model)

Khi Bob mở feed:
  1. Query DB: SELECT posts WHERE user_id IN (Alice's followings) ORDER BY created_at DESC
  2. Merge-sort results
  3. Return to user

Ưu điểm:
  - No fan-out cost khi celebrity post
  - Always fresh data

Nhược điểm:
  - Slow reads (query + merge-sort mỗi lần load feed)
  - DB load cao khi millions users scroll feed concurrently
-- Simplified feed query (pull model)
WITH following_users AS (
    SELECT following_id FROM relationships WHERE follower_id = $1
)
SELECT p.id, p.user_id, p.content, p.created_at, u.username, u.avatar_url
FROM posts p
JOIN users u ON p.user_id = u.id
WHERE p.user_id IN (SELECT following_id FROM following_users)
  AND p.is_deleted = false
ORDER BY p.created_at DESC
LIMIT 20;

Optimization: Shard query across DB replicas, cache recent posts per user.

2.3 Hybrid Approach (Twitter/Facebook)

Fan-out on write cho normal users (<10k followers)
Fan-out on read cho celebrities (>10k followers)

Khi Bob scroll feed:
  1. Fetch pre-computed feed từ Redis (normal users)
  2. Query latest posts từ celebrities Bob follows
  3. Merge-sort cả 2 sources
  4. Apply ranking/ML model
  5. Return feed

Trade-off balance:
  - Write cost giảm (không fan-out cho celebrity)
  - Read cost tăng nhẹ (query + merge)
  - User experience tốt (feed vẫn fresh)

3. Ranking Algorithm — EdgeRank & ML

3.1 Simple Time-based

Feed = Posts FROM followings ORDER BY created_at DESC

Problem: Miss important posts from days ago (e.g., friend's wedding photo)

3.2 EdgeRank (Facebook's classic algorithm)

Score = Affinity × Weight × Time Decay

Affinity: Mức độ user interact với post author
  - Lượt like/comment lịch sử
  - Frequency of interaction
  - 0.0 - 1.0

Weight: Post type weight
  - Video: 1.0
  - Photo: 0.8
  - Link: 0.5
  - Text: 0.3

Time Decay: e^(-λ × hours_since_post)
  - Recent posts: decay = 1.0
  - 24h ago: decay = 0.5
  - 7d ago: decay = 0.1

Final Score = Affinity × Weight × Time Decay
Rank feed by score DESC
import math
from datetime import datetime, timedelta

def calculate_edge_rank(post, user_id):
    # Affinity: từ user interaction history
    affinity = get_affinity(user_id, post.author_id)  # 0.0 - 1.0
    
    # Weight
    if post.media_type == 'video':
        weight = 1.0
    elif post.media_type == 'photo':
        weight = 0.8
    elif post.media_type == 'link':
        weight = 0.5
    else:
        weight = 0.3
    
    # Time decay
    hours_ago = (datetime.now() - post.created_at).total_seconds() / 3600
    time_decay = math.exp(-0.1 * hours_ago)  # lambda = 0.1
    
    return affinity * weight * time_decay

def rank_feed(posts, user_id):
    scored_posts = [(p, calculate_edge_rank(p, user_id)) for p in posts]
    scored_posts.sort(key=lambda x: x[1], reverse=True)
    return [p for p, _ in scored_posts]

3.3 ML-based Ranking (Modern approach)

Features:
  User features:
    - Demographics (age, location)
    - Historical engagement rate
    - Session time, frequency
    - Device type
  
  Post features:
    - Media type, length
    - Author's follower count, engagement rate
    - Post creation time
    - Content embeddings (BERT for text, ResNet for images)
  
  Contextual features:
    - User-author affinity
    - Time of day, day of week
    - User's recent interactions

Model: Two-stage ranking
  Stage 1: Candidate Generation (retrieve top 1000 from millions)
    → Collaborative filtering, matrix factorization
  
  Stage 2: Ranking (re-rank top 1000)
    → Gradient Boosting (LightGBM) hoặc Deep Learning (DNN)

Objective: Predict P(engage | user, post)
  Engage = like, comment, share, click, dwell time
import lightgbm as lgb
import pandas as pd

# Feature engineering
def extract_features(user, post, context):
    return {
        # User features
        'user_age': user.age,
        'user_engagement_rate': user.avg_engagement_rate,
        'user_follower_count': user.follower_count,
        
        # Post features
        'post_media_type': encode_media_type(post.media_type),
        'post_author_follower_count': post.author.follower_count,
        'post_like_count': post.like_count,
        'post_age_hours': (datetime.now() - post.created_at).total_seconds() / 3600,
        
        # Interaction features
        'user_author_affinity': context.get_affinity(user.id, post.author_id),
        'user_author_past_likes': context.count_past_likes(user.id, post.author_id),
        
        # Contextual
        'hour_of_day': datetime.now().hour,
        'is_weekend': 1 if datetime.now().weekday() >= 5 else 0,
    }

# Training
df_train = pd.read_parquet('s3://feeds/training_data.parquet')
X = df_train.drop(columns=['engaged'])
y = df_train['engaged']  # 1 if user engaged (like/comment/share), 0 otherwise

model = lgb.LGBMRanker(
    objective='lambdarank',
    metric='ndcg',
    boosting_type='gbdt',
    num_leaves=64,
    learning_rate=0.05,
    n_estimators=500
)
model.fit(X, y, group=df_train.groupby('user_id').size().values)

# Inference
def rank_posts_ml(user, candidate_posts):
    features = [extract_features(user, post, context) for post in candidate_posts]
    scores = model.predict(pd.DataFrame(features))
    
    ranked = sorted(zip(candidate_posts, scores), key=lambda x: x[1], reverse=True)
    return [post for post, _ in ranked]

4. Notifications — Real-time Push

4.1 Notification Types

1. Engagement: "Alice liked your post"
2. Social: "Bob started following you"
3. Mentions: "@user tagged you in a comment"
4. System: "Your post violates community guidelines"

4.2 Delivery Mechanism

┌─────────────┐          ┌──────────────┐          ┌──────────────┐
│   Action    │          │  Kafka Topic │          │ Notification │
│ (like post) ├─────────►│  "events"    ├─────────►│   Service    │
└─────────────┘          └──────────────┘          └───────┬──────┘
                                                           │
                         ┌─────────────────────────────────┼────────────┐
                         │                                 │            │
                         ▼                                 ▼            ▼
                   ┌──────────┐                    ┌──────────┐  ┌──────────┐
                   │  WebSock │                    │   FCM    │  │   APNs   │
                   │  (Web)   │                    │(Android) │  │  (iOS)   │
                   └──────────┘                    └──────────┘  └──────────┘
type NotificationService struct {
    kafka     *kafka.Producer
    websocket *WebSocketHub
    fcm       *FCMClient
    apns      *APNSClient
}

func (s *NotificationService) NotifyLike(likerID, postAuthorID, postID int64) error {
    // Bước 1: Create notification record
    notif := &Notification{
        Type:       "like",
        RecipientID: postAuthorID,
        ActorID:     likerID,
        PostID:      postID,
        CreatedAt:   time.Now(),
    }
    
    // Bước 2: Save to DB
    if err := s.db.Insert(notif); err != nil {
        return err
    }
    
    // Bước 3: Real-time delivery
    
    // WebSocket (nếu user online)
    if s.websocket.IsOnline(postAuthorID) {
        s.websocket.Send(postAuthorID, notif)
    }
    
    // Push notification (mobile)
    devices, _ := s.db.GetUserDevices(postAuthorID)
    for _, device := range devices {
        switch device.Platform {
        case "android":
            s.fcm.Send(device.Token, notif)
        case "ios":
            s.apns.Send(device.Token, notif)
        }
    }
    
    return nil
}

4.3 Notification Aggregation

Thay vì:
  "Alice liked your post"
  "Bob liked your post"
  "Charlie liked your post"
  (3 notifications)

Aggregate thành:
  "Alice, Bob, Charlie and 47 others liked your post"
  (1 notification)

Implementation: Batch notifications trong 5 phút window
from collections import defaultdict
import time

class NotificationAggregator:
    def __init__(self):
        self.buffer = defaultdict(list)  # {(type, post_id): [actor_ids]}
        self.last_flush = time.time()
    
    def add(self, notif_type, post_id, actor_id):
        key = (notif_type, post_id)
        self.buffer[key].append(actor_id)
        
        # Flush every 5 minutes
        if time.time() - self.last_flush > 300:
            self.flush()
    
    def flush(self):
        for (notif_type, post_id), actor_ids in self.buffer.items():
            if len(actor_ids) == 1:
                text = f"{actor_ids[0]} {notif_type}d your post"
            elif len(actor_ids) <= 3:
                names = ", ".join(actor_ids)
                text = f"{names} {notif_type}d your post"
            else:
                first_three = ", ".join(actor_ids[:3])
                others = len(actor_ids) - 3
                text = f"{first_three} and {others} others {notif_type}d your post"
            
            # Send aggregated notification
            self.send_notification(post_id, text)
        
        self.buffer.clear()
        self.last_flush = time.time()

5. Content Moderation

5.1 Multi-layer Approach

Layer 1: Automated filters (pre-publish)
  - Text: profanity filter, hate speech detector (ML model)
  - Image: NSFW classifier (ResNet-based)
  - Video: thumbnail-based check

Layer 2: User reports
  - Crowdsourced moderation
  - Report → review queue

Layer 3: Human moderators
  - Review flagged content
  - Final decision: approve, delete, ban user

Layer 4: Appeals
  - User can appeal ban
  - Senior moderator review

5.2 NSFW Image Detection

from transformers import pipeline

# Pre-trained NSFW classifier
classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")

def is_safe(image_url):
    result = classifier(image_url)
    # result = [{'label': 'nsfw', 'score': 0.95}, {'label': 'normal', 'score': 0.05}]
    
    nsfw_score = next((r['score'] for r in result if r['label'] == 'nsfw'), 0)
    return nsfw_score < 0.8  # threshold

# Integration
def upload_image(user_id, image_file):
    # Upload to CDN
    url = cdn.upload(image_file)
    
    # Check NSFW
    if not is_safe(url):
        # Auto-delete or flag for review
        moderation_queue.add(url, reason="nsfw")
        return {"error": "Image contains inappropriate content"}
    
    return {"url": url}

5.3 Hate Speech Detection

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("unitary/toxic-bert")
model = AutoModelForSequenceClassification.from_pretrained("unitary/toxic-bert")

def detect_toxicity(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    outputs = model(**inputs)
    
    scores = torch.nn.functional.softmax(outputs.logits, dim=-1)
    # Classes: toxic, severe_toxic, obscene, threat, insult, identity_hate
    
    toxic_score = scores[0][0].item()
    return toxic_score > 0.7  # Binary: toxic or not

# Pre-publish filter
def create_post(user_id, content):
    if detect_toxicity(content):
        return {"error": "Post contains inappropriate language"}
    
    # Proceed to publish
    post = db.insert_post(user_id, content)
    return {"post_id": post.id}

6. Distributed Counters (Likes, Followers)

6.1 Vấn đề

Khi post viral có 1M likes/minute:
  UPDATE posts SET like_count = like_count + 1 WHERE id = 123
  
  → Row-level lock contention
  → Throughput giảm xuống ~100 updates/second

6.2 Eventual Consistency Counter

Redis INCR + Periodic flush to DB

User likes post:
  1. INCR redis:post:123:like_count
  2. Background job (every 10s): Flush Redis → PostgreSQL
func LikePost(ctx context.Context, rdb *redis.Client, userID, postID int64) error {
    // Bước 1: Idempotency check
    likeKey := fmt.Sprintf("like:%d:%d", userID, postID)
    exists, _ := rdb.Exists(ctx, likeKey).Result()
    if exists > 0 {
        return ErrAlreadyLiked
    }
    
    // Bước 2: Increment counter in Redis
    counterKey := fmt.Sprintf("post:%d:like_count", postID)
    rdb.Incr(ctx, counterKey)
    
    // Bước 3: Mark user has liked (with expiry)
    rdb.Set(ctx, likeKey, 1, 30*24*time.Hour)
    
    // Bước 4: Insert to DB (eventual)
    // Can be async via message queue
    db.InsertLike(userID, postID)
    
    return nil
}

// Background job: Sync Redis counters → DB
func SyncCounters(ctx context.Context, rdb *redis.Client, db *sql.DB) {
    pattern := "post:*:like_count"
    iter := rdb.Scan(ctx, 0, pattern, 100).Iterator()
    
    for iter.Next(ctx) {
        key := iter.Val()
        // key = "post:123:like_count"
        
        count, _ := rdb.Get(ctx, key).Int64()
        postID := extractPostID(key)
        
        // Update DB
        db.Exec(`UPDATE posts SET like_count = $1 WHERE id = $2`, count, postID)
        
        // Optional: Delete Redis key sau khi sync (hoặc giữ cache)
    }
}

6.3 Approximate Counter with Sharding

Thay vì 1 Redis key, shard thành N shards:

post:123:like_count:shard0
post:123:like_count:shard1
...
post:123:like_count:shard9

User like → hash(user_id) % 10 → shard_id
Increment shard tương ứng

Total count = SUM(all shards)

Trade-off:
  - Reduce contention (10x parallel writes)
  - Slightly more complex to read (multi-get)

7. Graph Data — Friends & Mutual Friends

7.1 Bidirectional Friendship (Facebook)

CREATE TABLE friendships (
    user_id_1  BIGINT NOT NULL,
    user_id_2  BIGINT NOT NULL,
    status     VARCHAR(20),  -- PENDING, ACCEPTED, BLOCKED
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (user_id_1, user_id_2),
    CHECK (user_id_1 < user_id_2)  -- Enforce canonical order
);

-- Find friends
SELECT user_id_2 AS friend_id FROM friendships
WHERE user_id_1 = $1 AND status = 'ACCEPTED'
UNION
SELECT user_id_1 AS friend_id FROM friendships
WHERE user_id_2 = $1 AND status = 'ACCEPTED';

7.2 Mutual Friends Query

-- Mutual friends giữa user 123 và user 456
WITH user1_friends AS (
    SELECT user_id_2 AS friend_id FROM friendships
    WHERE user_id_1 = 123 AND status = 'ACCEPTED'
    UNION
    SELECT user_id_1 FROM friendships
    WHERE user_id_2 = 123 AND status = 'ACCEPTED'
),
user2_friends AS (
    SELECT user_id_2 AS friend_id FROM friendships
    WHERE user_id_1 = 456 AND status = 'ACCEPTED'
    UNION
    SELECT user_id_1 FROM friendships
    WHERE user_id_2 = 456 AND status = 'ACCEPTED'
)
SELECT friend_id FROM user1_friends
INTERSECT
SELECT friend_id FROM user2_friends;

Optimization: Cache friend lists in Redis (set), use SINTER for mutual friends.

func GetMutualFriends(ctx context.Context, rdb *redis.Client, user1, user2 int64) ([]int64, error) {
    key1 := fmt.Sprintf("friends:%d", user1)
    key2 := fmt.Sprintf("friends:%d", user2)
    
    // Redis set intersection
    mutualIDs, err := rdb.SInter(ctx, key1, key2).Result()
    if err != nil {
        return nil, err
    }
    
    result := []int64{}
    for _, idStr := range mutualIDs {
        id, _ := strconv.ParseInt(idStr, 10, 64)
        result = append(result, id)
    }
    return result, nil
}

8. Search — User & Post Discovery

// Index mapping
{
  "mappings": {
    "properties": {
      "username": {"type": "text", "analyzer": "standard"},
      "display_name": {"type": "text", "analyzer": "standard"},
      "bio": {"type": "text"},
      "follower_count": {"type": "integer"},
      "is_verified": {"type": "boolean"}
    }
  }
}

// Search query với fuzzy matching
{
  "query": {
    "bool": {
      "should": [
        {"match": {"username": {"query": "john", "boost": 3, "fuzziness": "AUTO"}}},
        {"match": {"display_name": {"query": "john", "boost": 2}}},
        {"match": {"bio": {"query": "john", "boost": 1}}}
      ],
      "filter": [
        {"term": {"is_verified": true}}
      ]
    }
  },
  "sort": [
    {"follower_count": {"order": "desc"}}
  ]
}

8.2 Autocomplete với Prefix Matching

Trie data structure hoặc Elasticsearch completion suggester

Input: "joh"
Suggestions: john_doe, johnsmith, johnny_depp
// Elasticsearch completion suggester
{
  "mappings": {
    "properties": {
      "username_suggest": {
        "type": "completion",
        "analyzer": "simple"
      }
    }
  }
}

// Query
{
  "suggest": {
    "username-suggest": {
      "prefix": "joh",
      "completion": {
        "field": "username_suggest",
        "size": 10,
        "fuzzy": {"fuzziness": 1}
      }
    }
  }
}

9. Privacy & Permissions

9.1 Post Visibility

Public: Ai cũng thấy
Friends: Chỉ friends thấy
Private: Chỉ mình thấy (draft)
Custom: Specific list of users

Implementation: Check permission before serve post
func CanViewPost(db *sql.DB, viewerID, postID int64) (bool, error) {
    var authorID int64
    var visibility string
    
    err := db.QueryRow(`
        SELECT user_id, visibility FROM posts WHERE id = $1
    `, postID).Scan(&authorID, &visibility)
    if err != nil {
        return false, err
    }
    
    switch visibility {
    case "public":
        return true, nil
    case "private":
        return viewerID == authorID, nil
    case "friends":
        return areFriends(db, viewerID, authorID)
    case "custom":
        return isInCustomList(db, postID, viewerID)
    default:
        return false, nil
    }
}

9.2 Blocking Users

CREATE TABLE blocks (
    blocker_id  BIGINT NOT NULL REFERENCES users(id),
    blocked_id  BIGINT NOT NULL REFERENCES users(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (blocker_id, blocked_id)
);

-- Filter blocked users khỏi feed
SELECT p.* FROM posts p
WHERE p.user_id NOT IN (
    SELECT blocked_id FROM blocks WHERE blocker_id = $1
)
AND p.user_id NOT IN (
    SELECT blocker_id FROM blocks WHERE blocked_id = $1
);

10. Interview Questions

Q: Thiết kế Twitter feed cho 500M users?

Components:

  • User service (shard by user_id)
  • Post service (shard by user_id)
  • Timeline service (hybrid fan-out)
  • Notification service (Kafka + push)
  • CDN for media

Trade-offs:

  • Fan-out on write vs read (hybrid)
  • Eventual consistency for counters
  • Caching strategy (Redis for hot data)

Q: Làm sao prevent fake followers/likes?

Answer:

  • Rate limiting: Max 100 follows/hour
  • CAPTCHA for suspicious behavior
  • Machine learning: Detect bot patterns (creation time, activity pattern)
  • Phone verification for new accounts
  • Shadowban suspected bots

Q: Tối ưu mutual friends query khi mỗi user có 5000 friends?

Answer:

  • Cache friend lists in Redis (set)
  • Use SINTER for intersection (O(N) với N = min(set1, set2))
  • Pre-compute mutual friends cho suggested friends
  • Limit to top 100 mutual (pagination)

Tóm tắt

Social network = Read-heavy, relationship-centric, real-time system.

Key challenges:

  • Scale reads: Billions feed requests/day → cache, CDN, read replicas
  • Fan-out: Celebrity với 100M followers → hybrid approach
  • Real-time: Notifications, chat, live updates → WebSocket, Kafka
  • Graph queries: Mutual friends, shortest path → graph DB hoặc cache
  • Content moderation: Hate speech, NSFW → ML models + human review

Patterns quan trọng:

  • Hybrid fan-out (write for normal, read for celebrity)
  • Eventual consistency counters (Redis INCR + periodic sync)
  • ML-based ranking (two-stage: candidate generation + ranking)
  • Notification aggregation (reduce noise)

Tài liệu tham khảo