🧠 Programming✍️ Khoa📅 19/04/2026☕ 9 phút đọc

Go Memory & GC Internals

Memory management và garbage collection là hai yếu tố quyết định performance của Go service. Hiểu sâu về chúng giúp bạn tối ưu latency, giảm memory footprint, và debug memory leaks.

💡 Go GC là concurrent, tri-color mark-and-sweep collector với target STW pause < 1ms.


Memory Allocator

Stack vs Heap

Stack:

  • Cấp phát nhanh (chỉ cần di chuyển stack pointer)
  • Tự động free khi function return
  • Size nhỏ (default 2KB, grow tự động đến 1GB)
  • Thread-local (không cần lock)

Heap:

  • Cấp phát chậm hơn (cần allocator)
  • Quản lý bởi GC
  • Size lớn, shared giữa goroutines
  • Có overhead (metadata, fragmentation)

Rule: Stack khi có thể, heap khi cần.


Escape Analysis

Compiler quyết định biến nằm ở stack hay heap dựa trên escape analysis.

Biến escape khi nào?

// 1. Return pointer to local variable
func newUser() *User {
    u := User{Name: "Alice"}
    return &u  // ← u escapes to heap
}

// 2. Store pointer in heap-allocated struct
type Container struct {
    data *Data
}

func create() *Container {
    d := Data{}
    return &Container{data: &d}  // ← d escapes
}

// 3. Pass to interface{} (type không biết trước)
func print(v interface{}) {
    fmt.Println(v)
}

func main() {
    x := 42
    print(x)  // ← x escapes (fmt.Println nhận interface{})
}

// 4. Send to channel
func send(ch chan int) {
    x := 42
    ch <- x  // ← x might escape (tùy channel buffer)
}

// 5. Closure captures variable
func outer() func() int {
    x := 0
    return func() int {
        x++      // ← x escapes (closure outlives outer)
        return x
}
}

// 6. Size không biết compile-time
func allocate(n int) []byte {
    return make([]byte, n)  // ← escapes (n runtime value)
}

Kiểm tra escape analysis

go build -gcflags='-m' main.go

Output:

./main.go:5:2: moved to heap: u
./main.go:6:9: &u escapes to heap
./main.go:15:13: ... argument does not escape
./main.go:15:13: x escapes to heap

Giải thích:

  • moved to heap: u → biến u được allocate trên heap
  • escapes to heap → pointer escape
  • does not escape → stay on stack

Tránh escape không cần thiết

// ❌ BAD: Unnecessary escape
func sum(nums []int) int {
    result := 0
    for _, n := range nums {
        result += n
    }
    return result  // result on stack, OK
}

// ✅ GOOD: No escape
func sumPtr(nums []int) *int {
    result := 0
    for _, n := range nums {
        result += n
    }
    return &result  // ← result escapes, but necessary
}

// ✅ BETTER: Avoid returning pointer if possible
func sum2(nums []int) int {
    result := 0
    for _, n := range nums {
        result += n
    }
    return result  // No escape
}

Memory Allocator: TCMalloc-inspired

Go allocator dựa trên TCMalloc (Thread-Caching Malloc).

Size classes

Objects được chia thành size classes:

Tiny: < 16 bytes
Small: 16 bytes - 32 KB (67 size classes)
Large: > 32 KB

Ví dụ size classes:

  • 8, 16, 24, 32, 48, 64, 80, 96, 112, 128, ...
  • Mỗi allocation làm tròn lên size class gần nhất
x := make([]byte, 33)  // Làm tròn lên 48 bytes
                       // Waste 15 bytes

Allocation path

Request allocation
  ↓
1. Tiny allocator (< 16B, no pointers)
   ├─ Yes → From P's tiny block
   └─ No → Next step
  ↓
2. Small allocator (16B - 32KB)
   ├─ Check P's mcache
   │  ├─ Has free span → Allocate
   │  └─ No free span → Get from mcentral
   └─ mcentral empty → Get from mheap
  ↓
3. Large allocator (> 32KB)
   └─ Allocate directly from mheap

Cấu trúc

mcache (per-P, no lock):

P's mcache
├── Tiny allocator
├── Span list for size class 1 (8B)
├── Span list for size class 2 (16B)
├── ...
└── Span list for size class 67 (32KB)

mcentral (per size class, with lock):

  • Chứa spans cho mỗi size class
  • Được share giữa các P

mheap (global, with lock):

  • Quản lý toàn bộ heap memory
  • Cấp phát spans cho mcentral
  • Trả memory về OS khi không dùng

Garbage Collector

GC algorithm: Concurrent Mark & Sweep

Phases:

1. Mark Setup (STW)
   ↓
2. Concurrent Mark (concurrent)
   ↓
3. Mark Termination (STW)
   ↓
4. Concurrent Sweep (concurrent)

Timeline:

User code running
  ↓
STW: Mark Setup (~100µs)
  ↓
User code + GC Mark (concurrent)
  ↓
STW: Mark Termination (~100µs)
  ↓
User code + GC Sweep (concurrent)
  ↓
Cycle complete

Tri-color marking

Colors:

  • White: Chưa scan (initially all objects)
  • Gray: Đã mark, chưa scan children
  • Black: Đã scan xong, và tất cả children đã mark

Algorithm:

1. Start: All objects white, roots gray
2. While gray objects exist:
   - Pick gray object
   - Scan its pointers, mark pointed objects gray
   - Mark itself black
3. End: Black objects reachable, white objects garbage

Concurrent marking issue: User code có thể modify pointers trong khi GC mark.

Solution: Write barrier — track pointer writes, re-mark nếu cần.

Write barrier

Khi user code ghi pointer:

obj.field = newPtr

Write barrier ghi lại:

If obj is black and newPtr is white:
    Mark newPtr gray

Cost: Mỗi pointer write có overhead nhỏ (~10-20 ns).

Khi nào active: Chỉ trong GC marking phase.


GC Tuning

GOGC

Default: GOGC=100

Ý nghĩa: GC trigger khi heap size tăng thêm 100% so với sau GC trước.

Heap after last GC: 100 MB
GOGC=100
→ Next GC triggers at: 200 MB

GOGC=200
→ Next GC triggers at: 300 MB

Trade-off:

  • GOGC cao → ít GC hơn, memory usage cao
  • GOGC thấp → nhiều GC hơn, memory usage thấp

Khi nào tune:

# Giảm memory usage (chấp nhận nhiều GC hơn)
GOGC=50 ./myapp

# Giảm GC frequency (chấp nhận dùng nhiều RAM hơn)
GOGC=200 ./myapp

# Disable GC (development only!)
GOGC=off ./myapp

GOMEMLIMIT (Go 1.19+)

Set soft memory limit:

GOMEMLIMIT=2GiB ./myapp

Ý nghĩa: GC cố gắng giữ heap usage < limit.

Priority: GOMEMLIMIT > GOGC

Benefit: Đoán trước memory usage trong container.

Example:

# Container có 4GB RAM
# Set limit 3GB để tránh OOM
GOMEMLIMIT=3GiB ./myapp

debug.SetGCPercent()

Dynamically adjust GOGC:

import "runtime/debug"

// Set GOGC to 200
debug.SetGCPercent(200)

// Disable GC
debug.SetGCPercent(-1)

Manual GC

runtime.GC()  // Force GC cycle

Use case:

  • Sau load spike, force GC để free memory
  • Testing, benchmarking

Note: Thường không cần manual GC — automatic GC đã tối ưu tốt.


Monitoring GC

1. runtime.ReadMemStats

var m runtime.MemStats
runtime.ReadMemStats(&m)

fmt.Printf("Alloc: %v MB\n", m.Alloc/1024/1024)
fmt.Printf("TotalAlloc: %v MB\n", m.TotalAlloc/1024/1024)
fmt.Printf("Sys: %v MB\n", m.Sys/1024/1024)
fmt.Printf("NumGC: %v\n", m.NumGC)
fmt.Printf("PauseTotalNs: %v ms\n", m.PauseTotalNs/1e6)

Metrics quan trọng:

  • Alloc: Heap allocated và đang dùng
  • Sys: Memory requested từ OS
  • NumGC: Số lần GC đã chạy
  • PauseNs: Pause times (STW duration)

2. GODEBUG=gctrace

GODEBUG=gctrace=1 ./myapp

Output:

gc 1 @0.001s 0%: 0.018+0.23+0.003 ms clock, 0.14+0.076/0.22/0.001+0.025 ms cpu, 4->4->0 MB, 5 MB goal, 8 P

Giải thích:

  • gc 1: GC cycle #1
  • @0.001s: 0.001s sau start
  • 0%: 0% CPU dành cho GC
  • 0.018+0.23+0.003 ms: STW + concurrent + STW pause
  • 4->4->0 MB: Heap before GC → after mark → after sweep
  • 5 MB goal: Target heap size
  • 8 P: 8 processors

3. pprof heap profile

go tool pprof http://localhost:6060/debug/pprof/heap

Commands:

(pprof) top      # Top allocators
(pprof) list <func>  # Line-by-line allocation
(pprof) web      # Visualize call graph

4. trace

curl http://localhost:6060/debug/pprof/trace?seconds=5 > trace.out
go tool trace trace.out

View: GC pause timeline, heap size over time.


Optimizing Allocations

1. Object pooling (sync.Pool)

Use case: Tái sử dụng objects thay vì allocate mới.

var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 4096)
    },
}

func process(data []byte) {
    buf := bufferPool.Get().([]byte)
    defer bufferPool.Put(buf)
    
    // Use buf
    copy(buf, data)
    // ...
}

Benefit: Giảm allocations → giảm GC pressure.

Note: Pool reset sau mỗi GC cycle (không phải cache vĩnh viễn).

2. Pre-allocate slices

// ❌ BAD: Append causes multiple allocations
var result []int
for i := 0; i < 1000; i++ {
    result = append(result, i)  // Reallocate nhiều lần
}

// ✅ GOOD: Pre-allocate
result := make([]int, 0, 1000)
for i := 0; i < 1000; i++ {
    result = append(result, i)  // No reallocation
}

3. Reuse strings

// ❌ BAD: String concat causes allocations
s := ""
for i := 0; i < 100; i++ {
    s += "hello"  // Each concat allocates new string
}

// ✅ GOOD: Use strings.Builder
var b strings.Builder
b.Grow(500)  // Pre-allocate
for i := 0; i < 100; i++ {
    b.WriteString("hello")
}
s := b.String()

4. Avoid []byte ↔ string conversion

// ❌ BAD: Conversion allocates
func process(data []byte) {
    s := string(data)  // Allocates new string
    // ...
}

// ✅ GOOD: Work with []byte directly
func process(data []byte) {
    // ...
}

// ✅ Unsafe (zero-copy, but dangerous)
import "unsafe"

func bytesToString(b []byte) string {
    return *(*string)(unsafe.Pointer(&b))
}

Warning: Unsafe approach breaks if []byte modified sau conversion.

5. Reduce pointer indirection

// ❌ BAD: Many pointers → GC scan overhead
type Node struct {
    Next *Node
    Data *Data
}

// ✅ GOOD: Value types khi có thể
type Node struct {
    Next *Node
    Data Data  // Embed value, not pointer
}

Trade-off: Value copy vs pointer scan time.


Memory Leaks

Common causes

1. Goroutine leak

// ❌ Goroutine never exits
func leak() {
    ch := make(chan int)
    go func() {
        <-ch  // Block forever if no sender
    }()
}

2. Forgotten callbacks

// ❌ Callback holds reference
type Handler struct {
    callbacks []func()
}

func (h *Handler) Register(cb func()) {
    h.callbacks = append(h.callbacks, cb)
    // Never removed → memory leak
}

3. Large slice holding reference

// ❌ Slice holds entire array
func process(data []byte) []byte {
    return data[0:10]  // Small slice, but references full array
}

// ✅ Copy to new slice
func process(data []byte) []byte {
    result := make([]byte, 10)
    copy(result, data[0:10])
    return result  // Original data can be GC'd
}

Debugging leaks

1. pprof heap diff

# Baseline
curl http://localhost:6060/debug/pprof/heap > heap1.out

# Wait...

# After some time
curl http://localhost:6060/debug/pprof/heap > heap2.out

# Compare
go tool pprof -base heap1.out heap2.out

2. Check goroutine count

import "runtime"

ticker := time.NewTicker(10 * time.Second)
go func() {
    for range ticker.C {
        fmt.Println("Goroutines:", runtime.NumGoroutine())
    }
}()

Nếu tăng liên tục → leak.


Tóm tắt

Concept Key Point
Stack vs Heap Stack nhanh, heap có GC overhead
Escape analysis Compiler quyết định allocation location
Allocator TCMalloc-inspired, per-P cache
GC Concurrent tri-color mark & sweep
GOGC Default 100 (GC khi heap double)
GOMEMLIMIT Soft limit cho heap size
Write barrier Track pointer writes trong mark phase
sync.Pool Reuse objects, giảm allocations
Memory leak Goroutine leak, forgotten references

Tài liệu tham khảo