✍️ Khoa📅 19/04/2026☕ 14 phút đọc

Cloud Security & IAM — Bức Tường Bảo Vệ Mà Bạn Không Thể Bỏ Qua

"Security is not a product, but a process." — Bruce Schneier. Và trong cloud, process đó bắt đầu từ ngày đầu tiên bạn tạo account, không phải sau khi bị breach.

Phần lớn các cloud breach không đến từ zero-day exploits hay sophisticated malware. Chúng đến từ misconfigured IAM policies, exposed credentials, S3 buckets mở public, và overly permissive security groups. Những thứ mà một engineer hiểu đúng có thể ngăn ngay từ đầu.

1. IAM Deep Dive — Nền Tảng Của Mọi Thứ

1.1 Anatomy của IAM: User, Group, Role, Policy

Bốn khái niệm này tạo nên hệ thống quyền hạn của AWS IAM (GCP dùng thuật ngữ tương đương nhưng hơi khác):

IAM Components:
┌─────────────────────────────────────────────────────────────┐
│                         AWS IAM                              │
│                                                              │
│  Principal (ai gửi request?)                                 │
│  ├── IAM User     → Long-lived identity (dùng cho human)     │
│  ├── IAM Role     → Assumed identity (dùng cho services)     │
│  ├── IAM Group    → Container cho Users (ko dùng cho Role)   │
│  └── AWS Service  → e.g. EC2, Lambda tự xác thực bằng Role  │
│                                                              │
│  Policy (được phép làm gì?)                                  │
│  ├── Identity-based Policy  → Gắn vào User/Role/Group        │
│  ├── Resource-based Policy  → Gắn vào resource (S3, SQS...) │
│  ├── Permission Boundary    → Ceiling của quyền             │
│  └── SCP (Org level)        → Guardrail cho toàn bộ account  │
└─────────────────────────────────────────────────────────────┘

Rule vàng: Human dùng IAM User (hoặc tốt hơn là SSO/Identity Federation). Application/Service dùng IAM Role. Không bao giờ ngược lại.

1.2 Khi nào dùng gì?

Entity	Dùng cho	Credentials	Rotation
IAM User	Human với long-term access	Access Key + Secret	Thủ công — nguy hiểm
IAM Role	EC2, Lambda, ECS Task, Cross-account	Temporary token (STS, 1-12h)	Tự động
Service Account (GCP)	Tương đương IAM Role	JSON key file hoặc Workload Identity	Key file = nguy hiểm
OIDC/WebIdentity	GitHub Actions, K8s Pods	Token exchange qua STS	Tự động

Trong thực tế: Nếu bạn đang tạo Access Key cho một Lambda function hay EC2 instance — đó là dấu hiệu bạn đang làm sai. Dùng IAM Role gắn trực tiếp vào resource.

1.3 Policy Evaluation Logic — Thứ tự ưu tiên

AWS IAM evaluate theo thứ tự này (biết để debug):

1. Explicit DENY  → Luôn thắng, không bàn cãi
2. SCP (Org)      → Nếu SCP deny, role/policy không cứu được
3. Resource Policy → S3 bucket policy, SQS policy...
4. Identity Policy → Policy gắn vào role/user
5. Permission Boundary → Ceiling không thể vượt qua
6. Session Policy → Khi assume role với giới hạn thêm
7. Implicit DENY  → Mặc định deny tất cả nếu không có Allow

Ví dụ debug thực tế:

Lỗi: "User is not authorized to perform: s3:GetObject"
Nhưng policy của user đã có s3:GetObject Allow?

Nguyên nhân có thể:
1. S3 bucket policy có Deny statement explicit
2. SCP tại Organization level block action này
3. Bucket ở account khác và resource policy không allow cross-account
4. KMS key policy block decrypt (nếu bucket encrypted)

1.4 Assume Role và Cross-Account Access

Pattern quan trọng nhất khi làm multi-account architecture:

// Trust Policy trên Role ở Account B (target)
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT_A:role/DeploymentRole"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "unique-external-id-12345"
        }
      }
    }
  ]
}

ExternalId là bắt buộc khi delegate cho third-party (phòng Confused Deputy attack). Với internal cross-account thì optional nhưng best practice.

2. Principle of Least Privilege — Từ Lý Thuyết Đến Thực Tế

"Least privilege" là câu mọi người nói nhưng ít ai implement đúng. Đây là checklist thực tế:

2.1 Start with Deny-All, then Allow What You Need

Sai (phổ biến):

{
  "Effect": "Allow",
  "Action": "s3:*",       // Wildcard là dấu hiệu lười
  "Resource": "*"          // Double wildcard = disaster
}

Đúng:

{
  "Effect": "Allow",
  "Action": [
    "s3:GetObject",
    "s3:PutObject"
  ],
  "Resource": "arn:aws:s3:::my-specific-bucket/*"
}

2.2 Dùng IAM Access Analyzer để tìm Overly Permissive Policies

AWS IAM Access Analyzer có thể:

Phát hiện resources được share public hoặc cross-account ngoài mong muốn
Generate policy từ CloudTrail logs (policy chỉ có quyền bạn thực sự đã dùng)
Validate policy trước khi deploy

# Generate policy từ CloudTrail — policy minimum cần thiết
aws iam generate-service-last-accessed-details \
  --arn arn:aws:iam::123456789:role/MyLambdaRole

# Access Analyzer findings
aws accessanalyzer list-findings \
  --analyzer-arn arn:aws:accessanalyzer:us-east-1:123456789:analyzer/my-analyzer

2.3 Checklist Least Privilege

☐ Không có policy nào dùng Action: "*" mà không có Deny explicit
☐ Mọi role đều scoped xuống resource ARN cụ thể (không phải "*")
☐ Lambda/EC2 role chỉ có quyền cần cho function đó, không share role
☐ S3 bucket không có Public Access (Block Public Access = ON ở account level)
☐ RDS không có public endpoint (trừ dev environment)
☐ Không có Access Key nào cho service identity (dùng Role)
☐ Access Key của human rotate < 90 ngày (tốt nhất là dùng SSO)
☐ Root account không có Access Key, có MFA
☐ CloudTrail enabled ở tất cả regions (kể cả region bạn không dùng)

3. Secrets Management — Đừng Để Secret Trong Code

3.1 So sánh các giải pháp

Tool	Pros	Cons	Best For
Hardcode trong code	Nhanh	Breach ngay khi repo leak	Không bao giờ
Environment Variables	Đơn giản	Visible trong process list, logs	Dev local only
AWS SSM Parameter Store	Free, native, versioning	Không có secret rotation built-in (trừ với Lambda)	Config + non-critical secrets
AWS Secrets Manager	Auto rotation, audit trail, native RDS integration	$0.40/secret/month + API calls	Database passwords, API keys
HashiCorp Vault	Multi-cloud, dynamic secrets, rich auth	Self-managed complexity	Multi-cloud, advanced use cases
GCP Secret Manager	Native GCP, versioning, audit	Chỉ trong GCP	GCP workloads

3.2 Pattern Đúng: Secrets Manager + IAM Role

# Sai: Secret trong code
DATABASE_PASSWORD = "super_secret_123"  # Đã bị ai đó commit lên GitHub

# Đúng: Fetch từ Secrets Manager lúc runtime
import boto3
import json

def get_secret(secret_name):
    client = boto3.client("secretsmanager", region_name="ap-southeast-1")
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response["SecretString"])

# Lambda/EC2 role có quyền secretsmanager:GetSecretValue
# trên resource ARN cụ thể — không cần access key
db_config = get_secret("prod/myapp/database")

3.3 Secret Rotation — Đừng Set and Forget

# Terraform: RDS với Secrets Manager rotation tự động
resource "aws_secretsmanager_secret_rotation" "db_rotation" {
  secret_id           = aws_secretsmanager_secret.db_password.id
  rotation_lambda_arn = aws_lambda_function.rotation_lambda.arn

  rotation_rules {
    automatically_after_days = 30  # Rotate mỗi 30 ngày
  }
}

3.4 Phát hiện Secrets đã Leak

# Git history scan — chạy trước khi push
pip install git-secrets detect-secrets

# Pre-commit hook
detect-secrets scan > .secrets.baseline
detect-secrets audit .secrets.baseline

# Hoặc dùng GitHub/GitLab secret scanning (built-in)
# AWS cũng có: Amazon Macie để scan S3 buckets

4. Network Security — Defense in Depth

4.1 Security Groups vs NACLs — Cái nào làm gì?

VPC (10.0.0.0/16)
│
├── NACL (Subnet level — stateless)
│   ├── Rule 100: Allow HTTPS inbound from 0.0.0.0/0
│   ├── Rule 200: Allow HTTP inbound from 0.0.0.0/0
│   ├── Rule 32766: Deny ALL (implicit)
│   └── Phải define BOTH inbound AND outbound (stateless!)
│
└── Security Group (Instance/ENI level — stateful)
    ├── Inbound: Allow 443 from ALB Security Group
    ├── Outbound: Allow ALL (mặc định)
    └── Stateful: return traffic tự động allowed

Khi nào dùng NACL? Rất hiếm khi — NACL là last line of defense, dùng để block một IP range cụ thể khi bị tấn công. Hầu hết logic bảo mật nên ở Security Group.

4.2 Security Group Best Practices

# Sai: Allow từ 0.0.0.0/0 vào port 22/3389
resource "aws_security_group_rule" "bad_ssh" {
  type        = "ingress"
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]  # SSH mở ra internet = disaster
}

# Đúng: Dùng AWS Systems Manager Session Manager thay cho SSH
# Không cần mở port 22, không cần bastion host
# SSM Session Manager = SSH qua AWS API (encrypted, audited)

# Nếu vẫn cần SSH: chỉ từ VPN/bastion IP
resource "aws_security_group_rule" "restricted_ssh" {
  type        = "ingress"
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["10.0.0.0/8"]  # Internal VPN only
}

4.3 WAF — Lớp Bảo Vệ Application Layer

AWS WAF nên được đặt trước CloudFront hoặc ALB:

Internet → CloudFront (WAF rules) → ALB → ECS/EKS

Managed Rule Groups đáng bật ngay:

AWSManagedRulesCommonRuleSet — OWASP Top 10 basics
AWSManagedRulesKnownBadInputsRuleSet — Log4Shell, Spring4Shell...
AWSManagedRulesSQLiRuleSet — SQL Injection
AWSManagedRulesAmazonIpReputationList — Known bad IPs

Chi phí WAF: ~$5/month/WebACL + $1/million requests — rất rẻ so với incident.

5. Zero Trust Architecture — Không Phải Buzzword

5.1 Tư duy Zero Trust vs Perimeter Security

Perimeter Security (cũ):
  ┌─────────────────────────────────────┐
  │ "Trusted" Internal Network          │
  │  ┌────┐  ┌────┐  ┌────┐           │
  │  │ DB │  │ API│  │App │  ← Trust  │
  │  └────┘  └────┘  └────┘   everyone│
  └────────────────┬────────────────────┘
                   │ Firewall
                 Internet (Untrusted)

Zero Trust:
  Mọi request đều phải authenticate + authorize
  Dù xuất phát từ internal network hay external
  
  Service A → [mTLS + JWT] → Service B
  Developer  → [SSO + MFA + Just-In-Time access] → Database
  CI/CD      → [OIDC token, không phải static key] → AWS

5.2 Implementing Zero Trust trong Cloud

1. Service-to-service: mTLS hoặc JWT verification

# Kubernetes: Istio sidecar inject mTLS tự động
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT  # Tất cả traffic phải có mTLS

2. GitHub Actions → AWS: OIDC thay vì Access Key

# .github/workflows/deploy.yml
permissions:
  id-token: write   # Cho phép OIDC token

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/GitHubActionsRole
          aws-region: ap-southeast-1
          # Không cần AWS_ACCESS_KEY_ID hay AWS_SECRET_ACCESS_KEY!

# Terraform: IAM Role cho GitHub Actions
resource "aws_iam_role" "github_actions" {
  name = "GitHubActionsRole"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
        }
        StringLike = {
          # Chỉ repo cụ thể, không phải toàn bộ org
          "token.actions.githubusercontent.com:sub" = "repo:my-org/my-repo:*"
        }
      }
    }]
  })
}

3. Just-In-Time (JIT) Database Access

Thay vì developer có persistent database access, dùng:

AWS IAM Database Authentication (RDS) — token 15 phút
HashiCorp Vault Dynamic Secrets — tạo database user tạm thời
AWS Systems Manager Session Manager + SSH tunnel — audit trail

6. Common IAM Mistakes — War Stories Thực Tế

Mistake 1: Wildcard Resources trong Production Policies

// Đây là policy của một startup sau incident
{
  "Effect": "Allow",
  "Action": "dynamodb:*",
  "Resource": "*"   // Lambda bị compromise → toàn bộ DynamoDB bị xóa
}

Fix: Scope xuống table ARN cụ thể. Nếu cần nhiều tables, list hết ra.

Mistake 2: Không Enable CloudTrail Multi-Region

Attacker thường operate ở region ít dùng (us-east-2, eu-west-2) vì CloudTrail thường chỉ bật ở region chính. Bật CloudTrail tất cả regions, centralize logs vào S3 với Object Lock.

Mistake 3: Dùng Root Account cho Automation

Root account credentials trong CI/CD pipeline là điều tệ nhất có thể xảy ra. Root có thể: xóa CloudTrail, thay đổi billing, đóng account, cancel support plan.

Fix: Root account → Enable MFA → Khóa access key → Không bao giờ dùng cho automation.

Mistake 4: `iam:PassRole` Bị Bỏ Qua

// Developer có quyền này:
{ "Action": "ec2:RunInstances", "Resource": "*" }

// Nhưng EC2 gắn role cần:
{ "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/*" }

// Nếu iam:PassRole quá rộng → có thể attach AdminRole vào EC2 → privilege escalation

iam:PassRole phải được scope xuống role ARN cụ thể.

Mistake 5: Không Monitor IAM Changes

# CloudWatch Alarm khi có IAM policy thay đổi
resource "aws_cloudwatch_metric_alarm" "iam_changes" {
  alarm_name  = "iam-policy-changes"
  metric_name = "IAMPolicyChanges"
  namespace   = "CloudTrailMetrics"
  
  # Alert ngay khi có thay đổi IAM
  comparison_operator = "GreaterThanOrEqualToThreshold"
  threshold           = 1
  evaluation_periods  = 1
  period              = 300
}

7. Cloud Security Posture Management (CSPM)

7.1 Native Tools (Free)

Tool	Platform	Chức năng
AWS Security Hub	AWS	Aggregate findings từ GuardDuty, Inspector, Macie. CIS Benchmark scoring
AWS Config	AWS	Track configuration changes, compliance rules
AWS GuardDuty	AWS	ML-based threat detection (unusual API calls, crypto mining, exfil)
GCP Security Command Center	GCP	Tương đương AWS Security Hub
Microsoft Defender for Cloud	Azure	Multi-cloud CSPM

GuardDuty là must-enable: $3/tháng cho account nhỏ, phát hiện được: Cryptomining, data exfiltration, privilege escalation, unusual S3 access patterns.

7.2 Khi Nào Cần Third-Party CSPM (Wiz, Prisma Cloud, Orca)?

Cần khi:

Multi-cloud (AWS + GCP + Azure) → Native tools không cover cross-cloud
Compliance yêu cầu (SOC2, ISO27001, PCI-DSS) → Automated evidence collection
Team security nhỏ cần dashboard unified, không muốn build
Agentless scanning (Wiz scan bằng cloud snapshot, không cần agent trên VM)

Chi phí: $100k+/năm → Chỉ justifiable khi team > 50 engineers hoặc compliance bắt buộc.

8. Compliance as Code — SCP và Organization Policies

8.1 AWS Service Control Policies (SCP)

SCP là guardrail ở Organization level — kể cả Admin trong account member không bypass được:

// SCP: Bắt buộc encrypt S3 bucket, cấm tắt CloudTrail
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyCloudTrailDisable",
      "Effect": "Deny",
      "Action": [
        "cloudtrail:StopLogging",
        "cloudtrail:DeleteTrail",
        "cloudtrail:UpdateTrail"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyPublicS3",
      "Effect": "Deny",
      "Action": "s3:PutBucketPublicAccessBlock",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "s3:publicAccessBlockConfiguration/RestrictPublicBuckets": "false"
        }
      }
    },
    {
      "Sid": "RequireRegion",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": ["ap-southeast-1", "us-east-1"]
        }
      }
    }
  ]
}

SCP không grant quyền, chỉ restrict. Phải có cả SCP Allow lẫn Identity Policy Allow thì mới được.

8.2 GCP Organization Policies

# Terraform: GCP Organization Policy — không cho phép public IP trên VM
resource "google_org_policy_policy" "no_public_ip" {
  name   = "organizations/123456789/policies/compute.vmExternalIpAccess"
  parent = "organizations/123456789"

  spec {
    rules {
      deny_all = "TRUE"
    }
  }
}

9. Audit Logging — Enable Từ Ngày Đầu

9.1 AWS CloudTrail Checklist

☐ Multi-region Trail enabled (không chỉ region chính)
☐ S3 data events enabled (ai access file gì trong S3)
☐ Lambda data events enabled (ai invoke Lambda nào)
☐ Log file integrity validation ON
☐ CloudTrail logs → S3 bucket với:
    ☐ Bucket Policy: deny delete (kể cả account admin)
    ☐ Object Lock: Compliance mode, 1 năm retention
    ☐ Access logging enabled
    ☐ Replication sang account khác (security account riêng)
☐ CloudWatch Logs integration (để query và alert)
☐ Athena table cho long-term analysis

9.2 Log Gì Là Đủ?

Tier 1 — Always ON (free hoặc rất rẻ):
  - CloudTrail Management Events
  - VPC Flow Logs (subnet level)
  - ALB Access Logs
  - GuardDuty

Tier 2 — Enable khi cần compliance:
  - CloudTrail Data Events (S3, Lambda) — tốn tiền với volume lớn
  - RDS/Aurora DB Activity Streams
  - CloudFront access logs

Tier 3 — Enable khi có incident / advanced threat hunting:
  - AWS Config all resources
  - Route53 DNS query logs
  - Security Group change logs

10. Mental Model — Security Checklist Cho Mỗi Service Mới

Mỗi khi deploy một service mới lên cloud, self-check:

IDENTITY:
  ☐ Service dùng IAM Role (không phải Access Key)?
  ☐ Role có least privilege (không wildcard)?
  ☐ Trust relationship scoped đúng?

NETWORK:
  ☐ Service ở private subnet (không có public IP)?
  ☐ Security group chỉ allow traffic từ specific source?
  ☐ Port 22/3389 không mở ra internet?

DATA:
  ☐ Data at rest encrypted (KMS)?
  ☐ Data in transit encrypted (TLS 1.2+)?
  ☐ Backup enabled và tested?

SECRETS:
  ☐ Không có hardcoded credential trong code?
  ☐ Secret được lưu trong Secrets Manager / Parameter Store?
  ☐ Rotation được configure?

LOGGING:
  ☐ Application logs → CloudWatch Logs?
  ☐ Access logs enabled?
  ☐ Alerts cho anomaly (GuardDuty, CloudWatch Alarm)?

COMPLIANCE:
  ☐ Resource tagging đúng convention?
  ☐ Trong allowed region (SCP)?
  ☐ Security Hub finding = 0 Critical, 0 High?

Security không phải là công việc của team Security. Nó là trách nhiệm của engineer build và own service đó.