Platform Engineering & Developer Experience — Build Đường Cao Tốc cho Team
"Bạn biết bạn đã thành công khi developers MUỐN dùng platform của bạn, không phải BỊ ÉP dùng." Đó là sự khác biệt giữa golden path và golden cage.
Platform Engineering là trend lớn nhất trong DevOps kể từ Kubernetes. Thay vì mỗi team tự setup CI/CD, monitoring, deployment — bạn xây 1 platform chung mà mọi team đều dùng và yêu thích.
1. Internal Developer Platform (IDP)
1.1 Tại sao cần?
Không có IDP:
Team A: Jenkins + Ansible + custom scripts
Team B: GitHub Actions + Terraform + bash
Team C: GitLab CI + Pulumi + Makefile
→ 3 teams, 3 cách deploy, 3 bộ tooling
→ Onboarding engineer mới: 2 tuần per team
→ Incident debugging: "Team B deploy khác team A"
→ Security audit: "Mỗi team có standards khác nhau"
Có IDP:
Mọi team: Platform → deploy, monitor, scale
→ Onboarding: 1 ngày
→ Consistency: deploy giống nhau
→ Security: enforced centrally
→ Productivity: engineer focus vào business logic
1.2 IDP Components
┌──────────────────────────────────────────┐
│ Developer Portal │
│ (Backstage/Port — service catalog, docs)│
├──────────────────────────────────────────┤
│ Golden Paths │
│ (Templates, scaffolding, best practices)│
├──────────────────────────────────────────┤
│ CI/CD Pipeline │
│ (Build, test, scan, deploy — automated) │
├──────────────────────────────────────────┤
│ Infrastructure Orchestration │
│ (Terraform, Crossplane, Pulumi) │
├──────────────────────────────────────────┤
│ Observability Stack │
│ (Metrics, logs, traces — pre-configured)│
├──────────────────────────────────────────┤
│ Security & Compliance │
│ (Scanning, policies, secrets management)│
└──────────────────────────────────────────┘
1.3 Tools
Backstage (Spotify, open source):
→ Service catalog: "Tất cả services ở đâu, ai own?"
→ Software templates: Tạo service mới trong 5 phút
→ TechDocs: Documentation site tự động
→ Plugin ecosystem: Kubernetes, CI/CD, PagerDuty
Port (SaaS):
→ No-code platform builder
→ Self-service actions (deploy, scale, rollback)
→ Scorecards: đo maturity level per service
Humanitec:
→ Platform Orchestrator
→ Score specification (platform-agnostic)
→ Dynamic environment management
2. Golden Paths — Paved Roads, Not Walls
2.1 Concept
Golden Path = The recommended, well-supported way to do things.
KHÔNG PHẢI:
→ Bắt buộc 100% (đó là mandate/wall)
→ Cấm làm khác (đó là cage)
MÀ LÀ:
→ "Nếu bạn đi đường này, mọi thứ đã sẵn sàng"
→ CI/CD configured
→ Monitoring dashboards ready
→ Security scanned
→ Documentation generated
→ On-call setup done
Off-golden-path:
→ Cho phép, nhưng team tự maintain
→ "Bạn muốn dùng framework lạ? OK, nhưng bạn own CI/CD,
monitoring, security cho nó."
→ Natural incentive: golden path = less work = adoption
2.2 Golden Path ví dụ
Tạo service mới (golden path):
1. Chạy template: backstage create → chọn "Go Microservice"
2. Template tạo repo với:
├── Dockerfile (optimized, multi-stage)
├── .github/workflows/ci.yaml (CI pipeline)
├── helm/ (Kubernetes deployment)
├── Makefile (dev commands)
├── monitoring/
│ ├── dashboard.json (Grafana)
│ └── alerts.yaml (Prometheus)
├── docs/ (API docs template)
└── main.go (boilerplate with health check, graceful shutdown)
3. PR → auto CI → deploy to staging
4. Production deploy qua approval pipeline
5. Service auto-registered trong service catalog
6. Monitoring dashboards tự xuất hiện
Time to first deploy: < 30 phút (thay vì 2 tuần)
3. DORA Metrics — Đo Engineering Effectiveness
3.1 4 Key Metrics (từ Accelerate book)
1. Deployment Frequency (DF):
"Bạn deploy thường xuyên cỡ nào?"
Elite: On-demand (multiple per day)
High: Weekly to monthly
Medium: Monthly to every 6 months
Low: Every 6 months+
2. Lead Time for Changes (LT):
"Từ commit → production mất bao lâu?"
Elite: < 1 hour
High: 1 day - 1 week
Medium: 1 week - 1 month
Low: 1 month - 6 months
3. Change Failure Rate (CFR):
"Bao nhiêu % deployments gây incident?"
Elite: 0-15%
High: 16-30%
Medium: 16-30%
Low: 46-60%
4. Mean Time to Restore (MTTR):
"Khi incident xảy ra, fix mất bao lâu?"
Elite: < 1 hour
High: < 1 day
Medium: 1 day - 1 week
Low: 1 week - 1 month
3.2 SPACE Framework (GitHub)
DORA đo delivery. SPACE đo developer experience rộng hơn:
S — Satisfaction & Well-being
→ Developer satisfaction survey
→ Burnout indicators
→ Tool satisfaction
P — Performance
→ Code quality (test coverage, defect rate)
→ Code review quality
→ Reliability of services owned
A — Activity
→ Commit frequency
→ PR throughput
→ Deployment count
⚠️ Cẩn thận: Activity ≠ Productivity
C — Communication & Collaboration
→ PR review turnaround time
→ Knowledge sharing frequency
→ Cross-team collaboration quality
E — Efficiency & Flow
→ Time in "flow state"
→ Context switching frequency
→ Build/test wait times
→ "Friction" points in workflow
3.3 Đo và Cải thiện
Đo:
→ DORA metrics: từ CI/CD pipeline data
→ SPACE: developer surveys (quarterly)
→ Tools: Sleuth, LinearB, Jellyfish, DX
Cải thiện (theo priority):
1. Build time > 10 phút? → Parallel tests, cache, smaller images
2. PR review > 24 giờ? → Bot reminders, smaller PRs
3. Deploy frequency < weekly? → Automate deployment, feature flags
4. MTTR > 1 giờ? → Better alerts, runbooks, rollback automation
5. CFR > 15%? → More testing, canary deployments
⚠️ ĐỪNG dùng metrics để evaluate individual engineers.
Metrics cho TEAM improvement, không phải performance review.
Goodhart's Law: "When a measure becomes a target,
it ceases to be a good measure."
4. Developer Experience (DX) Improvements
4.1 Build & Test Speed
Build time trực tiếp ảnh hưởng productivity:
Build < 2 phút: Developer chờ, stay in flow ✅
Build 5-10 phút: Switch context, check Slack ⚠️
Build > 10 phút: Go get coffee, forget what they were doing ❌
Optimizations:
→ Parallel test execution
→ Test result caching
→ Incremental builds
→ Docker layer caching
→ Smaller base images (distroless)
→ Remote build cache (Bazel, Turborepo)
4.2 Local Development
DevContainers:
→ VS Code opens in container with all tools pre-installed
→ Everyone has SAME environment (no "works on my machine")
→ .devcontainer/devcontainer.json in repo
Docker Compose cho local:
→ docker-compose up → toàn bộ stack chạy local
→ Hot reload cho application code
→ Mock external services
Tilt / Skaffold (K8s local dev):
→ Code change → auto rebuild → auto deploy to local K8s
→ Hot reload trong Kubernetes
→ Bridge local ↔ remote cluster
5. Service Maturity Scorecards
Đo maturity level cho mỗi service:
┌──────────────────┬────────┬────────┬────────┐
│ Category │ Bronze │ Silver │ Gold │
├──────────────────┼────────┼────────┼────────┤
│ Documentation │ README │ API doc│ Runbook│
│ Testing │ Unit │ +Integ │ +E2E │
│ Observability │ Logs │ +Metric│ +Traces│
│ Security │ HTTPS │ +Scan │ +mTLS │
│ Reliability │ Health │ +SLO │ +Chaos │
│ CI/CD │ Build │ +Test │ +Canary│
│ On-call │ Alert │ +Runbk │ +Escal │
└──────────────────┴────────┴────────┴────────┘
Target:
→ New services: Bronze at launch, Silver within 1 month
→ Critical services: Gold required
→ Visualize in Backstage: gamification effect 🏆
6. Tóm tắt
Platform Engineering Checklist:
□ IDP: Service catalog, templates, self-service
□ Golden Paths: Recommended, not mandated
□ DORA Metrics: Measure DF, LT, CFR, MTTR
□ Build Speed: < 5 minutes target
□ Local Dev: DevContainers / Docker Compose
□ Scorecards: Maturity levels per service
□ Feedback Loop: Regular developer surveys
Tài liệu tham khảo
- Team Topologies — Skelton & Pais
- Accelerate — Forsgren, Humble, Kim (DORA)
- Backstage.io
- Platform Engineering on Kubernetes
- CNCF Platform White Paper
💡 Remember: Platform tốt nhất = platform mà developers tự MUỐN dùng. Nếu phải ép → bạn đang build cage, không phải path. 🛤️