Vector Databases: Tα»« Zero ΔαΊΏn Senior+
Vector databases lΓ "ngΖ°α»i hΓΉng thαΊ§m lαΊ·ng" ΔαΊ±ng sau mα»i AI application bαΊ‘n dΓΉng hαΊ±ng ngΓ y: ChatGPT search, Spotify recommendations, Google Photos face recognition. NαΊΏu bαΊ‘n Δang build AI-powered apps, ΔΓ’y lΓ kiαΊΏn thα»©c must-have.
π― αΊ¨n dα»₯: Traditional DB nhΖ° tra tα»« Δiα»n (exact match). Vector DB nhΖ° hα»i "cho tΓ΄i nhα»―ng tα»« cΓ³ nghΔ©a tΖ°Ζ‘ng tα»±" β semantic search.
Level 1: Foundations (New β Junior)
Vector là gì?
Vector = Array of numbers representing meaning/features.
# Text example
"dog" β [0.2, 0.8, 0.1, 0.5, ...] # 768 dimensions
"puppy" β [0.25, 0.75, 0.15, 0.48, ...] # Similar to "dog"
"car" β [0.9, 0.1, 0.8, 0.2, ...] # Different from "dog"
# Image example
π (image) β [0.34, 0.12, 0.89, ...]
π (image) β [0.91, 0.05, 0.23, ...]
Key insight: Vectors capture semantic meaning β similar concepts have similar vectors.
Embeddings: Tα»« Data β Vector
Embedding = Process chuyα»n data (text, image, audio) thΓ nh vector.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Embed text
text = "I love machine learning"
embedding = model.encode(text)
print(embedding.shape) # (384,) - 384-dimensional vector
print(embedding[:5]) # [0.0234, -0.1234, 0.5678, ...]
Popular embedding models:
- Text: OpenAI
text-embedding-3-small, Cohere Embed, sentence-transformers - Images: CLIP, ResNet, ViT
- Code: CodeBERT, GraphCodeBERT
- Multimodal: CLIP (text + images)
Similarity Search: Core Operation
Problem: "TΓ¬m 10 items giα»ng nhαΊ₯t vα»i query nΓ y"
import numpy as np
# 3 documents in our "database"
docs = {
"doc1": np.array([0.2, 0.8, 0.1]), # "I love dogs"
"doc2": np.array([0.25, 0.75, 0.15]), # "Puppies are cute"
"doc3": np.array([0.9, 0.1, 0.8]), # "Cars are fast"
}
query = np.array([0.22, 0.78, 0.12]) # "Dogs are amazing"
# Calculate similarity
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
similarities = {
doc_id: cosine_similarity(query, vec)
for doc_id, vec in docs.items()
}
print(similarities)
# {'doc1': 0.999, 'doc2': 0.998, 'doc3': 0.542}
# β doc1 vΓ doc2 giα»ng query nhαΊ₯t!
Distance Metrics
Ba metrics phα» biαΊΏn:
1. Cosine Similarity (phα» biαΊΏn nhαΊ₯t):
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Range: -1 (opposite) to 1 (identical)
# 0 = orthogonal (khΓ΄ng liΓͺn quan)
2. Euclidean Distance (L2):
def euclidean_distance(a, b):
return np.linalg.norm(a - b)
# Smaller = more similar
3. Dot Product:
def dot_product(a, b):
return np.dot(a, b)
# Higher = more similar (for normalized vectors)
Khi nà o dùng cÑi gì?
- Cosine: Text embeddings (direction matters, not magnitude)
- Euclidean: Image embeddings (absolute position matters)
- Dot Product: When vectors are already normalized
Use Cases thα»±c tαΊΏ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Semantic Search β
β User: "how to train a neural network" β
β β Find similar docs (not just keyword match) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. Recommendation Systems β
β User liked movie A β
β β Find movies with similar embeddings β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. RAG (Retrieval Augmented Generation) β
β ChatGPT question β Find relevant docs β
β β Feed to LLM as context β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. Image Search β
β Upload image β Find similar images β
β (Google Photos, Pinterest) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5. Anomaly Detection β
β Normal transactions: cluster tightly β
β Fraud: far from cluster β detect! β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
First Vector DB: In-memory vα»i NumPy
class SimpleVectorDB:
def __init__(self):
self.vectors = []
self.metadata = []
def insert(self, vector, metadata):
self.vectors.append(vector)
self.metadata.append(metadata)
def search(self, query_vector, top_k=5):
# Brute force: compare vα»i tαΊ₯t cαΊ£ vectors
similarities = []
for i, vec in enumerate(self.vectors):
sim = cosine_similarity(query_vector, vec)
similarities.append((sim, i))
# Sort by similarity (descending)
similarities.sort(reverse=True)
# Return top K
results = []
for sim, idx in similarities[:top_k]:
results.append({
'similarity': sim,
'metadata': self.metadata[idx]
})
return results
# Usage
db = SimpleVectorDB()
# Insert documents
model = SentenceTransformer('all-MiniLM-L6-v2')
docs = [
"The cat sits on the mat",
"Dogs are loyal animals",
"Machine learning is fascinating",
]
for doc in docs:
embedding = model.encode(doc)
db.insert(embedding, {'text': doc})
# Search
query = "Tell me about artificial intelligence"
query_embedding = model.encode(query)
results = db.search(query_embedding, top_k=2)
for r in results:
print(f"Similarity: {r['similarity']:.3f}")
print(f"Text: {r['metadata']['text']}\n")
Problem vα»i approach nΓ y?
β O(N) complexity: PhαΊ£i compare vα»i TαΊ€T CαΊ’ vectors β slow khi cΓ³ millions vectors!
Level 2: Vector Databases & Indexing (Mid-level)
TαΊ‘i sao cαΊ§n Vector Database?
In-memory NumPy:
β
Simple
β
Good for <10K vectors
β Slow (O(N) search)
β No persistence
β No concurrent access
β KhΓ΄ng scale
Vector Database:
β
Fast search (O(log N) or better)
β
Persistent storage
β
Horizontal scaling
β
Production-ready (transactions, backups, monitoring)
β
Approximate search (trade accuracy for speed)
Popular Vector Databases
| Database | Type | Best For | Language |
|---|---|---|---|
| Pinecone | Cloud-native | Managed service, easy setup | Any (REST API) |
| Weaviate | Full-featured | Hybrid search, multi-tenancy | Go |
| Qdrant | Modern | Performance, Rust-powered | Rust |
| Milvus | Enterprise | Large scale, Kubernetes | C++/Python/Go |
| Chroma | Embedded | Development, prototyping | Python |
| pgvector | Extension | Existing Postgres users | SQL |
| Elasticsearch | Search engine | Already using ES | Java |
ANN: Approximate Nearest Neighbor
Trade-off: 100% accuracy vs speed
Exact search (brute force):
- Compare vα»i tαΊ₯t cαΊ£ N vectors
- 100% accuracy
- O(N) complexity
- Slow: 1M vectors = 1M comparisons
Approximate search (ANN):
- Use index structure (tree, graph)
- ~95-99% accuracy (configurable)
- O(log N) or O(1) complexity
- Fast: 1M vectors = ~20 comparisons
Key insight: Trong most use cases, 99% accuracy lΓ Δα»§ (user khΓ΄ng thαΊ₯y khΓ‘c biα»t).
Indexing Algorithms
1. HNSW (Hierarchical Navigable Small World)
Graph-based: Vectors lΓ nodes, edges connect similar vectors.
Level 2: π΄ ββ π΄ ββ π΄ (sparse, long jumps)
β β β
Level 1: π΅βπ΅βπ΅βπ΅βπ΅ (medium density)
β β β β β
Level 0: π’π’π’π’π’π’π’π’ (dense, all vectors)
Search:
1. Start at top level (long jumps)
2. Navigate to closest node
3. Drop to next level
4. Repeat until bottom
5. Refine search at bottom level
Characteristics:
- β Very fast search (O(log N))
- β High recall (accuracy)
- β Slow build time
- β Memory-intensive (graph in RAM)
Good for: Real-time search, when RAM is available
2. IVF (Inverted File Index)
Clustering-based: Partition vectors into clusters.
1. Build phase:
- K-means clustering β 1000 clusters
- Each vector assigned to nearest cluster
2. Search phase:
- Find query's nearest cluster(s)
- Search only within those clusters
- Only check 1/1000 of data!
Example:
Cluster 1: [dogs, puppies, pets, ...]
Cluster 2: [AI, ML, neural nets, ...]
Cluster 3: [cars, vehicles, ...]
Query: "machine learning" β Search Cluster 2 only
Characteristics:
- β Fast search (O(log N))
- β Less memory than HNSW
- β Lower recall (might miss edge cases)
- β Fast build time
Good for: Large datasets, limited RAM
3. Product Quantization (PQ)
Compression: Reduce vector size β fit more in RAM.
Original vector (768D, 32-bit float):
[0.234, -0.123, 0.567, ..., 0.891]
Size: 768 * 4 bytes = 3KB
After PQ compression:
[23, 145, 89, ...] (codebook indices)
Size: 768 / 8 * 1 byte = 96 bytes
β 32x compression!
Trade-off: Smaller size vs accuracy
Good for: Billions of vectors, RAM constraints
4. Hybrid Approaches
Most production systems combine algorithms:
Pinecone: IVF + PQ
Weaviate: HNSW + PQ (optional)
Milvus: IVF_FLAT, IVF_SQ8, HNSW, etc. (configurable)
Implementation: Pinecone (Managed Service)
import pinecone
from sentence_transformers import SentenceTransformer
# Initialize
pinecone.init(api_key="your-api-key", environment="us-east1-gcp")
# Create index
pinecone.create_index(
name="semantic-search",
dimension=384, # Model's output dimension
metric="cosine",
pod_type="p1.x1" # Performance tier
)
index = pinecone.Index("semantic-search")
# Embed model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Insert vectors
docs = [
{"id": "doc1", "text": "Machine learning basics"},
{"id": "doc2", "text": "Neural networks explained"},
{"id": "doc3", "text": "Cooking recipes for beginners"},
]
vectors_to_upsert = []
for doc in docs:
embedding = model.encode(doc['text']).tolist()
vectors_to_upsert.append({
"id": doc['id'],
"values": embedding,
"metadata": {"text": doc['text']}
})
index.upsert(vectors=vectors_to_upsert)
# Search
query = "Tell me about AI"
query_embedding = model.encode(query).tolist()
results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
for match in results['matches']:
print(f"Score: {match['score']:.3f}")
print(f"Text: {match['metadata']['text']}\n")
Implementation: Weaviate (Open-source)
import weaviate
from weaviate.classes.config import Configure
# Connect
client = weaviate.connect_to_local()
# Create collection with vectorizer
client.collections.create(
name="Document",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
properties=[
{"name": "title", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
]
)
collection = client.collections.get("Document")
# Insert (auto-vectorization)
collection.data.insert_many([
{"title": "AI Intro", "content": "Machine learning basics"},
{"title": "Cooking 101", "content": "How to make pasta"},
])
# Search (auto-vectorize query)
results = collection.query.near_text(
query="Tell me about artificial intelligence",
limit=2
)
for item in results.objects:
print(f"{item.properties['title']}: {item.properties['content']}")
Implementation: pgvector (PostgreSQL Extension)
-- Install extension
CREATE EXTENSION vector;
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(384) -- 384 dimensions
);
-- Create index (IVF)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- 100 clusters
-- Insert
INSERT INTO documents (content, embedding)
VALUES ('Machine learning basics', '[0.1, 0.2, 0.3, ...]');
-- Search
SELECT content, 1 - (embedding <=> '[0.12, 0.21, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.12, 0.21, ...]'
LIMIT 5;
Pros cα»§a pgvector:
- β Existing Postgres infrastructure
- β ACID transactions
- β Join vα»i relational data
- β Familiar SQL
Cons:
- β Slower than specialized vector DBs
- β Limited indexing options
- β Harder to scale horizontally
Level 3: Production & Advanced Patterns (Senior)
Hybrid Search: Keyword + Vector
Problem: Pure vector search khΓ΄ng tα»t vα»i exact matches (product IDs, names).
Solution: Combine keyword search (BM25) + vector search.
# Weaviate hybrid search
results = collection.query.hybrid(
query="iPhone 15",
alpha=0.5, # 0=keyword only, 1=vector only, 0.5=balanced
limit=10
)
How it works:
1. Keyword search (BM25):
"iPhone 15" β High score for exact matches
2. Vector search:
"iPhone 15" β High score for semantic matches
(e.g., "Apple's latest smartphone")
3. Combine scores:
final_score = alpha * vector_score + (1 - alpha) * keyword_score
Filtering with Metadata
Challenge: "Find similar docs, but only from 2024, in English, tagged 'tech'"
# Pinecone
results = index.query(
vector=query_embedding,
top_k=10,
filter={
"year": {"$eq": 2024},
"language": {"$eq": "en"},
"tags": {"$in": ["tech"]}
}
)
# Weaviate
results = collection.query.near_vector(
near_vector=query_embedding,
limit=10,
where={
"path": ["year"],
"operator": "Equal",
"valueInt": 2024
}
)
Architecture pattern:
ββββββββββββββββββββββββββββββββββββββββ
β Pre-filtering (before vector search)β
β Filter by metadata first β
β β Smaller candidate set β
β β Faster vector search β
β β Limited by index structure β
ββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββ
β Post-filtering (after vector search)β
β Vector search first β
β β Filter results β
β β
More flexible β
β β Might miss results if top_k low β
ββββββββββββββββββββββββββββββββββββββββ
Best practice: Use pre-filtering when possible (faster).
Multi-vector / Late Interaction
Problem: Single vector per document loses nuance.
Solution: Multiple vectors per document.
# ColBERT approach: Token-level embeddings
document = "Machine learning is a subset of AI"
tokens = ["Machine", "learning", "is", "a", "subset", "of", "AI"]
# Each token gets embedding
token_embeddings = [
model.encode(token) for token in tokens
] # 7 vectors for 1 document
# Search: Compare query tokens with document tokens
query = "What is ML?"
query_tokens = ["What", "is", "ML"]
query_embeddings = [model.encode(t) for t in query_tokens]
# Max similarity for each query token
score = sum([
max([cosine_sim(q_emb, d_emb) for d_emb in token_embeddings])
for q_emb in query_embeddings
])
Use case: Long documents, QA systems.
Chunking Strategies
Problem: Embeddings cΓ³ max length (512 tokens for many models).
Strategy 1: Fixed-size chunks
def chunk_text(text, chunk_size=512, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = ' '.join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
# Pro: Simple
# Con: Might split sentences/paragraphs awkwardly
Strategy 2: Semantic chunking
def semantic_chunk(text):
paragraphs = text.split('\n\n')
chunks = []
current_chunk = []
current_length = 0
for para in paragraphs:
para_len = len(para.split())
if current_length + para_len > 512:
chunks.append(' '.join(current_chunk))
current_chunk = [para]
current_length = para_len
else:
current_chunk.append(para)
current_length += para_len
if current_chunk:
chunks.append(' '.join(current_chunk))
return chunks
# Pro: Preserves semantic boundaries
# Con: Variable chunk sizes
Strategy 3: Sliding window with parent-child
# Store both:
# - Small chunks (for precise retrieval)
# - Large parent context (for LLM)
chunks = [
{"chunk": "ML is a subset of AI", "parent_id": "doc1"},
{"chunk": "Neural networks are...", "parent_id": "doc1"},
]
# Search on chunks, return parent doc for context
Batch Operations & Indexing
# Bad: One at a time
for doc in documents:
embedding = model.encode(doc)
index.upsert([{"id": doc['id'], "values": embedding}])
# Slow: Many network calls
# Good: Batch upsert
batch_size = 100
for i in range(0, len(documents), batch_size):
batch = documents[i:i+batch_size]
embeddings = model.encode([d['text'] for d in batch])
vectors = [
{"id": d['id'], "values": emb.tolist()}
for d, emb in zip(batch, embeddings)
]
index.upsert(vectors=vectors)
# 100x faster
Monitoring & Observability
Key metrics:
# 1. Query latency
import time
start = time.time()
results = index.query(query_vector, top_k=10)
latency = time.time() - start
# Target: <50ms for p99
# 2. Recall (accuracy)
# Compare ANN results vs exact search
def measure_recall(query_vector, k=10):
# ANN search
ann_results = index.query(query_vector, top_k=k)
ann_ids = {r['id'] for r in ann_results['matches']}
# Exact search (brute force)
exact_results = exact_search(query_vector, k)
exact_ids = {r['id'] for r in exact_results}
# Recall = intersection / k
recall = len(ann_ids & exact_ids) / k
return recall
# Target: >95% recall
# 3. Index build time
# Track re-indexing time when adding new vectors
# 4. Memory usage
# Monitor RAM consumption (especially for HNSW)
# 5. Error rate
# Failed queries, timeouts
Distributed Vector Search
Sharding strategies:
βββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Hash-based sharding β
β Vector ID β hash β shard assignment β
β β
Even distribution β
β β Each query must hit all shards β
βββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. Cluster-based sharding β
β Group similar vectors in same shard β
β β
Query only hits relevant shards β
β β Uneven distribution (hot shards) β
β β Requires initial clustering β
βββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. Replication β
β Each shard replicated 3x β
β β
High availability β
β β
Read scaling β
β β 3x storage cost β
βββββββββββββββββββββββββββββββββββββββββββββββββ
Milvus distributed example:
# Milvus cluster with 3 query nodes
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-milvus
spec:
mode: cluster
components:
queryNode:
replicas: 3 # 3 query nodes for parallel search
dataNode:
replicas: 2 # 2 data nodes for ingestion
Cost Optimization
1. Dimension reduction:
from sklearn.decomposition import PCA
# Original: 768D (OpenAI embedding)
# Reduced: 256D
pca = PCA(n_components=256)
reduced_embeddings = pca.fit_transform(original_embeddings)
# Trade-off: ~10% accuracy loss, 3x storage savings
2. Quantization:
# Convert float32 β int8
# 768D * 4 bytes = 3KB
# 768D * 1 byte = 768 bytes
# β 4x savings
# Pinecone: Automatic
# Milvus:
collection.create_index(
field_name="embedding",
index_params={
"metric_type": "L2",
"index_type": "IVF_SQ8", # Scalar quantization to 8-bit
"params": {"nlist": 1024}
}
)
3. Lazy loading (tiered storage):
Hot tier (SSD): Recent/popular vectors
Cold tier (HDD/S3): Old/rarely accessed vectors
Move vectors between tiers based on access patterns
Level 4: Bleeding Edge (Senior+)
Learned Indexes
Idea: Use ML model Δα» predict vector position trong index.
# Traditional: Tree/graph traversal
# Learned: Model.predict(vector) β position
# Example: RMI (Recursive Model Index)
# Stage 1: Coarse model (neural net)
# Input: vector β Output: approximate position range
# Stage 2: Fine model
# Input: vector + range β Output: exact position
Status: Research phase, not production-ready yet (2026).
Multi-modal Embeddings
CLIP: Unified embedding space for text + images.
import clip
import torch
model, preprocess = clip.load("ViT-B/32")
# Embed image
image = preprocess(Image.open("dog.jpg")).unsqueeze(0)
image_embedding = model.encode_image(image)
# Embed text
text = clip.tokenize(["a photo of a dog"])
text_embedding = model.encode_text(text)
# Compare
similarity = torch.cosine_similarity(image_embedding, text_embedding)
# Use case: Search images with text!
query = "sunset over mountains"
# β Find images matching that description
Streaming Updates
Challenge: Millions of vectors added per day, can't rebuild index.
Solution: Incremental indexing.
# Pinecone: Real-time updates (no rebuild needed)
index.upsert(new_vectors) # Available immediately
# Milvus: Segment-based
# New vectors β New segment
# Background: Merge segments periodically
Vector Databases at Scale
Netflix: 100M+ vectors (movie/user embeddings)
- Milvus cluster
- PQ compression
- Hybrid search (vector + metadata filters)
Pinterest: 3B+ vectors (image embeddings)
- Custom C++ implementation
- GPU-accelerated search
- Distributed across 100+ nodes
Uber: 1B+ vectors (driver/rider embeddings)
- Hybrid PostgreSQL + specialized vector engine
- Real-time updates (<100ms latency)
GPU Acceleration
# Faiss on GPU (Facebook AI Similarity Search)
import faiss
# CPU
index_cpu = faiss.IndexFlatL2(dimension)
index_cpu.add(vectors)
D, I = index_cpu.search(query_vectors, k=10)
# GPU (10-100x faster)
res = faiss.StandardGpuResources()
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)
D, I = index_gpu.search(query_vectors, k=10)
Use case: Batch similarity computation (recommendations, deduplication).
Production Checklist
For New Projects (0-10K vectors)
- Chroma or pgvector (simple, embedded)
- Basic cosine similarity search
- Metadata filtering
- Monitor query latency
For Growing Projects (10K-1M vectors)
- Pinecone (managed) or Weaviate (self-hosted)
- HNSW or IVF indexing
- Hybrid search (keyword + vector)
- Batch upserts
- Set up monitoring (latency, recall)
For Scale (1M+ vectors)
- Milvus or Qdrant (horizontal scaling)
- Quantization (PQ or SQ8)
- Distributed deployment (3+ nodes)
- Replication for HA
- GPU acceleration (if batch workloads)
- Cost optimization (dimension reduction, tiered storage)
- A/B test index configurations
For Enterprise (10M+ vectors)
- Custom tuning (index params, ef_construction, nprobe)
- Multi-region deployment
- Disaster recovery plan
- Security (encryption at rest/transit, access control)
- Compliance (data residency, audit logs)
- Dedicated SRE team
Common Mistakes & How to Avoid
1. β Not normalizing vectors
# Bad: Unnormalized vectors
embedding = model.encode(text) # [0.5, 10.3, -2.4, ...]
# Good: Normalize before storing
from sklearn.preprocessing import normalize
embedding_norm = normalize([embedding])[0] # L2 norm = 1
Why: Affects cosine similarity (assumes normalized vectors).
2. β Wrong distance metric
# Text embeddings: Use COSINE
index = pinecone.Index(..., metric="cosine")
# Image embeddings: Often EUCLIDEAN
index = pinecone.Index(..., metric="euclidean")
3. β Forgetting metadata
# Bad: Only store vectors
index.upsert([{"id": "1", "values": embedding}])
# β Can't filter, can't show original text
# Good: Store metadata
index.upsert([{
"id": "1",
"values": embedding,
"metadata": {
"text": original_text,
"source": "wikipedia",
"date": "2024-01-15",
"tags": ["AI", "ML"]
}
}])
4. β Not tuning index parameters
# Default HNSW params might not be optimal
# Tune ef_construction, M for your use case
# Low latency, OK with lower recall:
index.create_index(index_type="HNSW", params={"M": 16, "efConstruction": 100})
# High recall, OK with higher latency:
index.create_index(index_type="HNSW", params={"M": 64, "efConstruction": 500})
5. β Synchronous embedding generation
# Bad: Block API response
@app.post("/search")
def search(query: str):
embedding = model.encode(query) # 100-500ms!
results = index.query(embedding)
return results
# Good: Cache embeddings or use async
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_embedding(text: str):
return model.encode(text)
Resources
Vector Databases:
Embeddings:
Research Papers:
- "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs" (HNSW)
- "Product Quantization for Nearest Neighbor Search" (PQ)
- "Learning to Index" (Learned indexes)
Blogs:
Interview Questions
Junior:
- Vector embedding là gì? Tẑi sao cần nó?
- Cosine similarity vs Euclidean distance β khi nΓ o dΓΉng cΓ‘i nΓ o?
- Use cases cα»§a vector database?
Mid:
- So sΓ‘nh HNSW vs IVF indexing
- Trade-offs giα»―a exact search vs ANN
- ThiαΊΏt kαΊΏ RAG system vα»i vector DB
Senior:
- Distributed vector search architecture
- Cost optimization strategies cho 1B+ vectors
- Hybrid search implementation (keyword + vector)
Senior+:
- Multi-modal embedding challenges
- Learned indexes for vector search
- Real-time updates at scale
TΓ³m tαΊ―t
| Level | Focus | Tools |
|---|---|---|
| New | Vector concepts, embeddings, similarity | NumPy, sentence-transformers |
| Junior | Vector DB basics, indexing | Chroma, pgvector, Pinecone |
| Mid | Production deployment, hybrid search | Weaviate, Qdrant, Milvus |
| Senior | Scale, distributed systems, cost optimization | Custom configs, monitoring, GPU |
| Senior+ | Bleeding edge (multi-modal, learned indexes) | Research papers, custom solutions |
Key takeaway: Vector databases power semantic search β understanding embeddings β similarity β indexing lΓ foundation. From there, scale up dα»±a trΓͺn use case.
ChΓΊc bαΊ‘n build Δược nhα»―ng AI-powered applications tuyα»t vα»i! π
BΖ°α»c tiαΊΏp theo:
storage-and-indexing.mdβ Traditional indexing (B-tree vs vector indexes)query-and-transactions.mdβ Query optimization- Hands-on: Build semantic search cho docs cα»§a bαΊ‘n!