Ana içeriğe geç

Cloud Cost Allocation — Faturayı Anlamak#

"Bu ay AWS faturası $42,318. Ne için? Bilmiyorum. Hangi takım yaktı? Bilmiyorum. Optimize edebileceğimiz yer? Bilmiyorum." Bu sorulara saatler içinde cevap verebilen ekiplerin maliyeti 2 yıl içinde %30-50 düşüyor.


📐 Hedef#

Her dolar (TL/EUR) için: kim, ne için harcadı?

3 nokta: 1. Showback — her ekip kendi maliyetini görür (peer pressure) 2. Chargeback — finans ekibe dahili fatura keser (gerçek hesap) 3. Anomaly detection — ay sonunu beklemeden sürpriz yakala


🏷️ 1. Tagging Strategy (Foundation)#

Tagging eksikse hiçbir allocation çalışmaz. Önce buna yatır.

Zorunlu tag set#

Tag Örnek Niye
Environment prod, staging, dev Maliyet ayrımı
Team payments, growth, platform Ownership
Service api, worker, db Workload-level
CostCenter eng-1234 Finans entegrasyonu
ManagedBy terraform, helm, manual Drift detection
Owner <TEAM_HANDLE> Sorumluluk
Project (opsiyonel) mobile-revamp Initiative tracking

Enforcement#

1️⃣ AWS Service Control Policy (org-level)

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": ["ec2:RunInstances"],
    "Resource": "arn:aws:ec2:*:*:instance/*",
    "Condition": {
      "Null": { "aws:RequestTag/Environment": "true" }
    }
  }]
}

2️⃣ Terraform validation

# modules/required-tags/main.tf
variable "tags" {
  type = map(string)
  validation {
    condition = alltrue([
      contains(keys(var.tags), "Environment"),
      contains(keys(var.tags), "Team"),
      contains(keys(var.tags), "CostCenter"),
    ])
    error_message = "Tags Environment, Team, CostCenter zorunludur."
  }
}

3️⃣ Kyverno (Kubernetes)

# 17-Templates/kyverno-policies/require-labels.yaml içinde

4️⃣ AWS Config Rulesrequired-tags rule, non-compliant resource'ları otomatik raporla.

Mevcut resource'ların retro-tag'lenmesi#

# AWS Resource Groups Tagging API ile bulk tag
aws resourcegroupstaggingapi tag-resources \
  --resource-arn-list arn:aws:s3:::bucket1 arn:aws:s3:::bucket2 \
  --tags Environment=prod,Team=platform,CostCenter=eng-1001

💡 İpucu: Untagged resource raporu haftalık. 4 hafta üst üste untagged kalanlar otomatik durdurulur (Lambda + scheduled).


📊 2. Allocation Reports#

A. AWS Cost Explorer (built-in)#

# Bu ay servis bazında
aws ce get-cost-and-usage \
  --time-period Start=$(date -d 'first day of month' +%F),End=$(date +%F) \
  --granularity DAILY \
  --metrics UnblendedCost \
  --group-by Type=DIMENSION,Key=SERVICE

# Tag bazında (Team = payments)
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%F),End=$(date +%F) \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --filter '{"Tags":{"Key":"Team","Values":["payments"]}}'

# Tag breakdown ile group-by
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%F),End=$(date +%F) \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --group-by Type=TAG,Key=Team

B. Cost & Usage Report (CUR) → Athena#

CUR detaylı (her resource her saat) ama büyük. Athena ile sorgula:

-- Top 20 service by cost (son 7 gün)
SELECT
  product_servicecode AS service,
  SUM(line_item_unblended_cost) AS cost
FROM cost_and_usage_report
WHERE line_item_usage_start_date >= date_add('day', -7, current_date)
GROUP BY product_servicecode
ORDER BY cost DESC
LIMIT 20;

-- Per-team breakdown
SELECT
  resource_tags_user_team AS team,
  SUM(line_item_unblended_cost) AS cost
FROM cost_and_usage_report
WHERE line_item_usage_start_date >= date_add('day', -30, current_date)
  AND resource_tags_user_team IS NOT NULL
GROUP BY resource_tags_user_team
ORDER BY cost DESC;

-- Untagged resource'lar (kayıp para)
SELECT
  product_servicecode,
  SUM(line_item_unblended_cost) AS cost,
  COUNT(DISTINCT line_item_resource_id) AS resource_count
FROM cost_and_usage_report
WHERE line_item_usage_start_date >= date_add('day', -7, current_date)
  AND resource_tags_user_team IS NULL
GROUP BY product_servicecode
ORDER BY cost DESC;

C. Kubernetes — OpenCost / Kubecost#

Kubernetes maliyet attribution için (cloud bill K8s-native değil):

# OpenCost (CNCF, OSS)
helm install opencost opencost/opencost -n opencost --create-namespace

# Kubecost (OpenCost'un üst seti, UI dahil)
helm install kubecost \
  --repo https://kubecost.github.io/cost-analyzer \
  cost-analyzer \
  -n kubecost \
  --create-namespace

Bu tool'lar: - Pod başına compute/memory cost - Namespace breakdown - Workload (deployment) breakdown - PVC cost - Idle resource (request edildi ama kullanılmadı) — gizli israf

# CLI ile (Kubecost API)
curl http://kubecost.kubecost:9090/model/allocation \
  --data-urlencode 'window=7d' \
  --data-urlencode 'aggregate=namespace' \
  --data-urlencode 'accumulate=true' | jq

💸 3. Showback / Chargeback Modeli#

Showback (önerilen başlangıç)#

Her ekip kendi maliyetini görür, finans hareketi yok.

Aylık dashboard / e-mail:

┌────────────────────────────────────────────────┐
│  Team: payments                                 │
│  Period: Mart 2026                              │
├────────────────────────────────────────────────┤
│  Total: $4,820                                  │
│                                                  │
│  Compute (EKS)             $2,340  (49%)         │
│  RDS (Postgres)            $1,200  (25%)         │
│  S3 (snapshots)              $480  (10%)         │
│  Data transfer (egress)      $400   (8%)         │
│  CloudWatch logs             $180   (4%)         │
│  Other                       $220   (4%)         │
│                                                  │
│  vs last month: +$340 (+8%)                      │
│  vs budget:     ($5,000 budget, %96 of budget)   │
│                                                  │
│  ⚠️  Anomaly: S3 +$200 (snapshots 30→90 day)     │
│                                                  │
│  🔝 Top 5 cost drivers:                          │
│  1. eks-prod-cluster       $1,800                │
│  2. rds-payments-primary     $720                │
│  3. eks-staging-cluster      $540                │
│  4. rds-payments-replica     $480                │
│  5. s3-payment-receipts      $480                │
└────────────────────────────────────────────────┘

Chargeback (büyük org'lar)#

Finans her ekibe iç fatura keser. Engineering bütçesi gerçek = team cost.

Avantajı: maliyet bilinci max Dezavantajı: bürokratik, küçük org'larda overkill


🚨 4. Anomaly Detection#

Ay sonu sürpriz patlamayı önler.

AWS Cost Anomaly Detection (built-in)#

# Monitor oluştur (her servis için günlük anomaly takibi)
aws ce create-anomaly-monitor --anomaly-monitor '{
  "MonitorName": "Daily-Service-Anomaly",
  "MonitorType": "DIMENSIONAL",
  "MonitorDimension": "SERVICE"
}'

# Subscription (Slack/email)
aws ce create-anomaly-subscription --anomaly-subscription '{
  "SubscriptionName": "FinOps-Slack",
  "Threshold": 100,
  "Frequency": "DAILY",
  "MonitorArnList": ["arn:aws:ce::<ACCOUNT_ID>:anomalymonitor/<ID>"],
  "Subscribers": [{"Type":"SNS","Address":"arn:aws:sns:<REGION>:<ACCOUNT_ID>:cost-alerts"}]
}'

Custom (daha hassas) — Athena + cron#

-- Yesterday vs 7-day average, sapma > %30
WITH daily AS (
  SELECT
    DATE(line_item_usage_start_date) AS day,
    resource_tags_user_team AS team,
    product_servicecode AS service,
    SUM(line_item_unblended_cost) AS cost
  FROM cost_and_usage_report
  WHERE line_item_usage_start_date >= date_add('day', -8, current_date)
  GROUP BY 1, 2, 3
)
SELECT
  team, service,
  yesterday_cost,
  weekly_avg,
  ROUND(((yesterday_cost - weekly_avg) / weekly_avg) * 100, 1) AS pct_change
FROM (
  SELECT
    team, service,
    SUM(CASE WHEN day = current_date - 1 THEN cost END) AS yesterday_cost,
    AVG(CASE WHEN day BETWEEN current_date - 8 AND current_date - 2 THEN cost END) AS weekly_avg
  FROM daily
  GROUP BY 1, 2
)
WHERE yesterday_cost > 50  -- noise filter
  AND yesterday_cost > weekly_avg * 1.30
ORDER BY pct_change DESC;

Sonucu Slack'e at:

🚨 Cost anomaly detected:
- payments / RDS: $250 yesterday (avg $80, +212%)
- growth / DataTransfer: $890 yesterday (avg $300, +197%)


🎯 5. Quick Wins (ilk 30 günde %15-30 tasarruf)#

# 1. Idle EC2 (stopped > 30 gün)
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=stopped" \
  --query 'Reservations[].Instances[?StateTransitionReason!=null && StateTransitionReason<`'$(date -d '30 days ago' +%F)'`]'

# 2. Boşta Elastic IP (her biri ~$3.6/ay)
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]'

# 3. Kullanılmayan EBS volume
aws ec2 describe-volumes --filters Name=status,Values=available

# 4. EBS gp2 → gp3 (aynı performans, %20 ucuz)
aws ec2 describe-volumes \
  --filters Name=volume-type,Values=gp2 \
  --query 'Volumes[].VolumeId' --output text \
  | xargs -n 1 aws ec2 modify-volume --volume-type gp3 --volume-id

# 5. Eski snapshot'lar
aws ec2 describe-snapshots --owner-ids self \
  --query "Snapshots[?StartTime<='$(date -d '90 days ago' +%F)'].SnapshotId" \
  --output text | xargs -n1 aws ec2 delete-snapshot --snapshot-id

# 6. RDS public access (yanlış config + maliyet)
aws rds describe-db-instances \
  --query 'DBInstances[?PubliclyAccessible==`true`].[DBInstanceIdentifier]'

# 7. Boşta Load Balancer
aws elbv2 describe-load-balancers --query 'LoadBalancers[].LoadBalancerArn' \
  --output text | while read arn; do
  count=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/ApplicationELB --metric-name RequestCount \
    --dimensions Name=LoadBalancer,Value=${arn##*/} \
    --start-time $(date -u -d '7 days ago' +%FT%TZ) \
    --end-time $(date -u +%FT%TZ) --period 86400 --statistics Sum \
    --query 'sum(Datapoints[].Sum)' --output text)
  [ "$count" = "None" -o "$count" = "0.0" ] && echo "Idle: $arn"
done

Egress (gizli en büyük gider)#

AWS data transfer OUT $0.09/GB. 1 TB/ay = $90. 50 TB/ay = $4,500. Kontrol etmediğin sürece kontrolsüz büyür.

  • ✅ Aynı region içinde S3 → EC2: ücretsiz
  • ✅ VPC Endpoint (S3/DynamoDB): NAT GW egress'i ortadan kaldırır
  • ✅ CloudFront / CDN: kullanıcıya yakın cache
  • ✅ Cloudflare R2 — egress ücreti yok
  • ❌ Cross-AZ aynı region (bilemem ama olur, $0.01/GB ekler)
  • ❌ Cross-region (en pahalı)

📈 6. Reserved Instances / Savings Plans#

Düzenli kullanılan baseline kapasiteyi commit ile satın al:

Strateji Discount Risk
3-year all-upfront RI %72'ye kadar Yüksek (esneklik yok)
1-year SP (Compute) %30-50 Orta (instance type değiştirilebilir)
3-year SP (Compute) %50-65 Yüksek
Spot %50-90 Yüksek (interruption)

Strateji önerisi#

Baseline (24/7 sürekli)         → SP 1-year compute
Stable, tip değişmeyecek         → RI 1-year
Burst / batch / fault-tolerant   → Spot
Dev/test                         → Spot (auto-pause overnight)

⚠️ Commit cliff: 1-year SP'in expiration tarihi yaklaşırken 60 gün önce alarm. Yeniden alma planı yap. Cliff'e çarpıp $$$ patlamak yaygın hata.


🛠️ 7. PR-time Cost Diff (Infracost)#

Terraform değişiklik PR'larında, merge öncesi maliyet diff:

# .github/workflows/infracost.yml
name: Infracost
on: [pull_request]

jobs:
  diff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
      - name: Generate baseline
        run: |
          git checkout ${{ github.event.pull_request.base.ref }}
          infracost breakdown --path terraform --format json --out-file baseline.json
      - name: Generate diff
        run: |
          git checkout ${{ github.event.pull_request.head.ref }}
          infracost diff --path terraform --compare-to baseline.json --format json --out-file diff.json
      - name: PR comment
        run: infracost comment github --path diff.json --behavior update \
                --repo $GITHUB_REPOSITORY --pull-request ${{ github.event.pull_request.number }} \
                --github-token ${{ secrets.GITHUB_TOKEN }}

PR'da otomatik comment:

### 💰 Infracost estimate

Project       baseline  PR        diff
my-infra      $4,820    $5,140    +$320 (+6.6%)

Top changes:
+ aws_db_instance.replica          +$240/mo
+ aws_eks_node_group.gpu-pool       +$180/mo
- aws_instance.legacy-bastion      -$100/mo

Monthly cost change: +$320


📚 Devamı#