Kubernetes Cost Optimization: FinOps Strategies (2025)

Oct 26, 2025
kubernetesfinopscostautoscaling
0

Kubernetes is powerful—and notoriously easy to overspend on. This guide provides practical controls to cut costs 20–50% while maintaining reliability.

Executive summary

  • Rightsize requests/limits; enforce via policies
  • Use HPA/VPA where appropriate; treat limits as SLOs
  • Prefer spot/preemptible for stateless; bin-pack with topology spread
  • Track unit economics: $/req, $/tenant, $/job

Requests/limits hygiene

apiVersion: v1
kind: Pod
spec:
  containers:
    - name: api
      resources:
        requests:
          cpu: "200m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"
  • Establish default requests; deny pods without requests; monitor throttling/oomkill

Autoscaling

  • HPA: CPU/Memory + custom metrics (QPS, latency)
  • VPA: recommendation mode first; avoid fighting HPA
  • Cluster Autoscaler: scale nodes based on pending pods

Spot/preemptible

  • Use PDBs, checkpoints, idempotent jobs; multiple spot pools; surge buffer

Bin-packing and topology

  • Node sizing; taints/tolerations; topologySpreadConstraints; overprovision pods for spikes

FinOps dashboards

  • Kubecost + Prometheus; cost allocation by namespace/label; anomaly alerts

Policy enforcement

  • Gatekeeper/OPA: deny no‑request pods; enforce limits; restrict instance types

FAQ

Q: HPA vs VPA?
A: HPA for horizontal scaling; VPA for rightsizing. Avoid conflicting signals by running VPA in recommendation mode with HPA.

  • Terraform Best Practices: /blog/terraform-best-practices-infrastructure-as-code-2025
  • OpenTelemetry Guide: /blog/observability-opentelemetry-complete-implementation-guide
  • Platform Engineering: /blog/platform-engineering-internal-developer-platforms-2025
  • GitOps Strategies: /blog/gitops-argocd-flux-kubernetes-deployment-strategies
  • Service Mesh Comparison: /blog/service-mesh-istio-linkerd-comparison-guide-2025

Call to action

Want a K8s cost review and savings plan? Get a free FinOps consult.
Contact: /contact • Newsletter: /newsletter


Kubernetes Cost Optimization: FinOps Strategies (2025)

A pragmatic guide to reduce Kubernetes spend without sacrificing reliability.


1) Objectives and Guardrails

  • Reduce $/unit (req, order, job) while meeting SLOs
  • Rightsize continuously; prevent regressions with policy and CI gates
  • Observe costs per team/workload; align budgets with ownership

2) Cost Drivers (High Level)

- Compute: requests/limits, binpacking, idle capacity, overprovisioning
- Storage: class (gp2/gp3/premium), IOPS, snapshots, orphan PVCs
- Network: cross-AZ/region egress, NAT, LBs, service mesh overhead
- Platform overhead: control plane, observability stack, service mesh

3) Cluster Autoscaler (CA) Tuning

# CA flags (sketch)
- --balance-similar-node-groups=true
- --expander=least-waste
- --scale-down-utilization-threshold=0.5
- --scale-down-unneeded-time=10m
  • Separate node groups by workload type (general, memory, CPU, GPU)
  • Prefer fewer larger nodes for binpacking unless pod anti-affinity requires spread

4) HPA and Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource: { name: cpu, target: { type: Utilization, averageUtilization: 65 } }
    - type: Pods
      pods:
        metric: { name: rq_per_pod }
        target: { type: AverageValue, averageValue: "40" }
  • Use RPS/queue depth as primary signals; CPU-only leads to waste

5) Vertical Pod Autoscaler (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
spec:
  updatePolicy: { updateMode: "Off" } # recommend-first
  • Run in recommend-only to guide requests updates; avoid flapping

6) Karpenter for Fast Binpacking

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
spec:
  consolidation: { enabled: true }
  requirements:
    - key: kubernetes.io/arch
      operator: In
      values: [amd64, arm64]
  • Enables rapid scale and consolidation; pair with budgets and taints

7) Spot/Preemptible and Mixed Pools

- Use spot for stateless, retryable, non-prod jobs
- Mix on-demand and spot with PDBs and priorities
- Budget-aware fallbacks to on-demand during spot scarcity

8) Reservations and Savings Plans

- Baseline steady load with RIs/SPs; leave burst to on-demand/spot
- Right-size terms (1y vs 3y), regional vs zonal, convertible RIs for flexibility

9) Requests, Limits, and Quotas

apiVersion: v1
kind: ResourceQuota
spec:
  hard:
    requests.cpu: "200"
    requests.memory: 400Gi
    limits.cpu: "400"
    limits.memory: 800Gi
  • Enforce limitRange defaults; forbid pods without requests/limits

10) Policy-as-Code (OPA/Kyverno)

# Kyverno: require requests/limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
spec:
  validationFailureAction: enforce
  rules:
    - name: require-requests-limits
      match: { resources: { kinds: ["Pod","Deployment","StatefulSet"] } }
      validate:
        message: "Requests and limits required"
        pattern:
          spec:
            containers:
              - resources:
                  requests: { cpu: "?*", memory: "?*" }
                  limits: { cpu: "?*", memory: "?*" }

11) Storage Strategies

- Prefer gp3/premiumV2 with tuned IOPS; avoid default gp2
- Use Retain only where needed; clean orphan PVCs and snapshots
- Move logs/temp to ephemeral; compress and tier to cheap storage

12) Network and Egress Controls

- Keep traffic in-zone/region; use private endpoints; reduce NAT
- Collapse noisy sidecars; prefer node-local caching
- CDN offload for egress-heavy paths

13) Service Mesh Overhead

- Sidecarless/ambient when feasible; reduce mTLS costs with HW offload
- Sample traces; reduce metrics cardinality; cap retries

14) Image and Registry Costs

- Slim base images; multi-arch; layer reuse; registry caching
- Avoid cross-region pulls; colocate registry with clusters

15) GPU Workloads

- Fractional GPUs (MIG/MPS) where supported; binpack DL jobs
- Spot GPUs for training; on-demand for inference SLOs

16) Scheduling and Affinity

spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector: { matchLabels: { app: web } }
  • Taints for spot; tolerations for opt-in workloads; priorities for eviction

17) Descheduler and Consolidation

apiVersion: descheduler/v1alpha1
kind: DeschedulerPolicy
profiles:
  - name: balance
    strategies:
      RemoveDuplicates: { enabled: true }
      LowNodeUtilization: { enabled: true }
  • Periodically rebalance to free nodes for scale-down

18) Observability for Cost

# CPU request vs usage
sum(kube_pod_container_resource_requests{resource="cpu"})
  / sum(rate(container_cpu_usage_seconds_total[5m]))

# Idle nodes
sum(kube_node_status_capacity_cpu_cores) - sum(kube_node_status_allocatable_cpu_cores)
  • Track $ per unit: cost per request/order/session/build

19) Allocation, Showback, Chargeback

- Label/annotate: team, service, env, cost-center
- Cost tools (OpenCost/Kubecost) map cloud bills to k8s resources
- Monthly reports; budgets with alerts; corrective actions

20) FinOps Workflow

- Detect: dashboards, alerts, anomaly detection
- Diagnose: who/what/why; unit cost changed?
- Decide: trade-offs with SLOs; approval matrix
- Deliver: PRs, policy updates, autoscaler tuning
- Document: runbooks, owners, SLAs

21) CI/CD Cost Controls

- Parallelism caps; ephemeral preview TTLs; cache artifacts; small runners
- Build on cheaper instance types; ARM64 where supported

22) Runbooks

- Cost Spike: check deploys, HPA settings, traffic, egress, storage snapshots
- Idle Spend: identify idle nodes, orphan PVC/LBs, zombie namespaces
- Spot Flaps: increase PDB, diversify instance types, fallback policy

23) Dashboards (Sketch JSON)

{
  "title": "Kubernetes Cost Overview",
  "panels": [
    {"type":"stat","title":"$/1000 req"},
    {"type":"graph","title":"CPU Req vs Usage"},
    {"type":"table","title":"Top Namespaces by Cost"}
  ]
}

24) FAQ (1–200)

  1. How much headroom is healthy?
    Target 20–30% for burst; lower with fast scaling.

  2. Should I remove limits?
    Keep limits to prevent noisy neighbors; set sane ceilings.

  3. Is spot safe for prod?
    Yes for stateless and resilient services; protect critical paths.

  4. Do I need Kubecost?
    Use OpenCost or cloud-native + labels; pick one and act on the data.


25) Workload Profiles and Sizing

- Latency-sensitive web: low CPU request, modest burst, HPA on RPS + CPU
- Batch/ETL: high CPU/mem, tolerant to preemption; schedule nightly windows
- Streaming: steady CPU, strict p99; avoid spot if stateful; tune JVM
- ML inference: GPU/CPU bound; autoscale on QPS and queue depth
- Build CI: short-lived, use cheap nodes; aggressive binpacking

26) Node Pools and Instance Selection

- Separate pools: general, memory-optimized, compute-optimized, GPU, ARM64
- Prefer latest-gen instances with better perf/$ (e.g., C7g vs C5)
- Use fewer, larger nodes for binpacking (watch PDB and max pods per node)
- Cap max pods per node to avoid ENI/IP exhaustion

27) ARM64 (Graviton/Neoverse) Strategy

- 20–40% perf/$ gains typical; rebuild images multi-arch; test perf
- Pin critical workloads to proven arch; dual-arch pools during migration
- Cache warming: pre-pull images; registry in-region

28) Requests Right-Sizing Method

- Collect 7–14 days of usage per container at stable traffic
- Pick p90–p95 usage as request; add 10–20% headroom; set limits 1.5–2×
- Revisit quarterly or after major releases; automate via recommender

29) Requests Estimator (Pseudo)

function recommendRequests(samples: number[]): { req: number; limit: number } {
  const sorted = [...samples].sort((a,b)=>a-b);
  const p = (k:number)=> sorted[Math.min(sorted.length-1, Math.floor(k*sorted.length))];
  const req = p(0.9) * 1.1; // 10% headroom
  const limit = req * 1.8;
  return { req, limit };
}

30) HPA + VPA Interplay

- Avoid VPA live-updating with HPA; use VPA recommend-only for HPA targets
- Tune stabilization windows; prevent oscillations with target tracking

31) Karpenter Consolidation Tuning

spec:
  consolidation: { enabled: true }
  limits:
    resources:
      cpu: "2000"
      memory: 4Ti
  providerRef: { name: default }
- Use consolidation to replace underutilized nodes with denser placements
- Protect critical pods with PDB and priorities

32) Spot Best Practices

- Diversify 3–5 instance types per pool; use capacity-optimized allocation
- PDB minAvailable >= 1 for HA; checkpoint batch jobs
- Fallback to on-demand when interruption rates spike

33) Savings Plans / RIs Strategy

- Cover 60–80% baseline with SP/RIs; convertibles for flexibility
- Model demand across regions; avoid over-commit; monitor coverage/unused

34) Namespace Budgets and Quotas

apiVersion: v1
kind: ResourceQuota
metadata: { name: team-a-quota, namespace: team-a }
spec:
  hard:
    requests.cpu: "400"
    requests.memory: 800Gi
    pods: "600"
- Review monthly; alert at 80/90/100%; support burst approvals via PR

35) Policy Pack (Kyverno)

# Disallow :latest images, forbid NoLimits, require topologySpread

36) Storage Optimization (Extended)

- gp3 with tuned IOPS for DB; gp2 only for legacy; cold snapshots lifecycle
- Delete Completed PVCs for Jobs; set retention policies; prune orphans
- Compress logs; move to object storage; TTL temp volumes

37) Network/Egress Patterns

- PrivateLinks/Service Endpoints; VPC endpoints for S3/Blob/GCR
- NAT instance sizing; reduce east-west via locality; co-locate DBs

38) Telemetry Slimming

- Metrics: drop high-cardinality labels; 1–5m scrape; histograms limited
- Traces: sample 1–5%; tail-based for errors; drop internal noisy paths
- Logs: structure; drop debug in prod; TTL cold storage

39) Image Slimming

- Distroless; multi-stage builds; prune dev deps; compress layers
- Registry cache in-cluster; avoid cross-region pulls

40) GPU Queues and Binpacking

- Use queues for DL jobs; pack multi-GPU nodes; preemption for low-priority
- MIG for sharing; pin inference to small GPUs; autoscale per queue depth

41) CronJobs Windows and Batching

spec:
  schedule: "*/10 * * * *"
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  concurrencyPolicy: Forbid
- Run batch at off-peak; aggregate small jobs; prefer queues for spikes

42) Anti-Affinity and Spreading

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
- Balance resiliency with binpacking; avoid excessive spreading that blocks CA

43) Descheduler Profiles

profiles:
  - name: costs
    strategies:
      LowNodeUtilization: { enabled: true }
      RemoveDuplicates: { enabled: true }

44) OpenCost/Kubecost Setup

- Install with cloud billing export; labels for team/service/env
- Verify allocation aligns with budgets; reconcile monthly

45) Dashboards and Alerts

{
  "title": "Cost Signals",
  "panels": [
    {"type":"graph","title":"CPU Req vs Usage"},
    {"type":"graph","title":"Node Utilization"},
    {"type":"table","title":"Top Workloads by Idle $"}
  ]
}
# Alerts: idle nodes > N, request/usage ratio > X, egress spike

46) Runbooks (Extended)

Spike in $/req
- Check deploys, HPA/VPA, traffic mix, cache hit, egress, storage IOPS
Idle node surge
- Descheduler; consolidate; CA scale down limits; blocked PDBs
Egress anomaly
- Identify destination; switch to private links; CDN offload; caches

47) Case Studies (Brief)

- ARM64 migration: 28% perf/$ gain; image rebuild; mixed pools rollout
- Spot adoption: 45% savings on CI/batch; PDB tuning; fallbacks
- Storage cleanup: 30% cut via orphan PVC prune and snapshot TTL

48) Infra Reservations via IaC (Sketch)

# Terraform Savings Plans example (pseudo)
resource "aws_savingsplans_plan" "compute" {
  commitment = "10"
  term        = 1
  payment_option = "NO_UPFRONT"
}

49) PromQL Library

# Requests/Usage ratio by namespace
sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
/ clamp_min(sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace), 0.001)

# Node idle CPU cores
sum(kube_node_status_allocatable_cpu_cores) - sum(irate(container_cpu_usage_seconds_total[5m]))

50) FAQ (201–600)

  1. How low can requests go?
    Enough to avoid throttling; prefer target tracking HPA to handle spikes.

  2. Do I disable limits to improve binpacking?
    Keep sensible limits; avoid unlimited memory which risks OOM killer.

  3. Is descheduler safe?
    Use conservatively; off-hours; monitor disruption and PDB respect.

  4. Can I move observability off-cluster?
    Yes—managed backends cut node/IO costs; weigh egress.

  5. Are DaemonSets expensive?
    Yes per-node overhead; consolidate and trim exports.


JSON-LD

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Kubernetes Cost Optimization: FinOps Strategies (2025)",
  "description": "Comprehensive guide to reducing Kubernetes spend: autoscaling, binpacking, spot, storage/network, OpenCost, and runbooks.",
  "datePublished": "2025-10-28",
  "dateModified": "2025-10-28",
  "author": {"@type":"Person","name":"Elysiate"}
}
</script>


CTA

Need to cut Kubernetes spend without risk? We implement autoscaling, binpacking, and showback with strong guardrails.


Appendix AA — Kubelet and Node OS Tuning

- Cgroups v2; proper CPU manager policy (static/none) by workload
- Eviction thresholds tuned (memory.available, nodefs.available)
- image-gc-high-threshold, image-gc-low-threshold to reduce churn
- ReadOnlyRootFilesystem where possible; tmpfs for temp

Appendix AB — Scheduler and Priorities

- priorityClass for critical vs best-effort workloads
- preemptionPolicy on low-priority jobs; ensure PDB compliance
- Binpacking scoring plugins: balance cost vs spread

Appendix AC — Pod Disruption Budgets (PDB)

apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  minAvailable: 1
- Right-size PDBs to allow scale-down; avoid blocking consolidation

Appendix AD — Consolidation Strategies

- Descheduler + Karpenter consolidation
- Controlled drain windows; respect workload SLOs

Appendix AE — Multi-Cluster Cost Patterns

- Split prod vs batch clusters; separate noisy tenants
- Regional clusters to reduce cross-region egress
- Centralized shared services (CI, registry) sized appropriately

Appendix AF — AWS-Specific Tips

- Use Graviton (ARM64) where possible (C7g/M7g/R7g)
- gp3 with tuned IOPS; EBS volume attachment limits planning
- PrivateLink for SaaS; save NAT costs; S3 Gateway endpoints
- Savings Plans coverage monitoring (Compute SP)

Appendix AG — GCP-Specific Tips

- Preemptible VMs for batch; N2D (AMD) for perf/$
- Filestore tiers; Cloud NAT per-subnet tuning; VPC-SC for egress control
- Committed use discounts; regionalize GCR/Artifact Registry

Appendix AH — Azure-Specific Tips

- Dv5/Ev5 for perf/$; Premium SSD v2 tuning
- Private Endpoints; Azure NAT Gateway sizing; Savings Plans/Reserved
- Container Registry geo-replication in-region

Appendix AI — NAT/Egress Cost Controls

- Minimize NAT with VPC endpoints; collapse egress via proxies/CDN
- Audit egress-heavy services; cache and co-locate backends

Appendix AJ — Observability Stack Costs

- Metrics: limit cardinality; remote-write to managed stores
- Logs: sampling and TTL; structure; centralize parsing
- Traces: head/tail sampling; route only errors to persistent storage

Appendix AK — Showback/Chargeback

- Labels: team, service, env, cost-center mandatory
- Allocation via OpenCost; monthly reviews with owners
- Budget thresholds with alerts; corrective SLAs

Appendix AL — CI/CD Costs

- Reuse caches; keep runners small; ARM64 builders; limit parallelism
- Build avoidance, test selection, artifact TTLs

Appendix AM — Policy Pack (Kyverno/OPA)

# Require requests/limits; disallow hostNetwork; forbid :latest; enforce spread

Appendix AN — FinOps Governance

- Monthly savings reviews; anomaly response; budget vs actuals
- Owner accountability; executive dashboards; unit-cost goals

Appendix AO — Performance Engineering Tie-In

- Improve cache hit; reduce cold starts; faster startup reduces overprovision
- Profile hotspots; lower CPU per request; smaller images

Appendix AP — Dashboards (PromQL Sketch)

# $/req proxy metric = infra_cost_per_min / req_per_min
# CPU overrequest ratio
sum(kube_pod_container_resource_requests{resource="cpu"}) / sum(irate(container_cpu_usage_seconds_total[5m]))

# Node utilization
sum(irate(container_cpu_usage_seconds_total[5m])) / sum(kube_node_status_allocatable_cpu_cores)

Appendix AQ — Runbooks (Quick)

- CA not scaling down: PDB blocking, DaemonSet density, min nodes set
- High egress: identify dests; add endpoints; compress; CDN
- Storage spike: snapshots growth; orphan PVC; logs in volumes

Mega FAQ (601–1000)

  1. Best HPA signal?
    Business metrics (RPS/queue); CPU as secondary.

  2. VPA live updates safe?
    Generally not with HPA; recommend-only for sizing.

  3. Are bigger nodes always cheaper?
    Often, but watch max pods/IPs and failure blast radius.

  4. Spot for stateful?
    Avoid; if used, replicate and checkpoint aggressively.

  5. How to curb observability cost?
    Cardinality controls, sampling, TTL, managed backends.

  6. Is ARM migration worth it?
    Usually 20–40% perf/$; test first.

  7. Do quotas hinder agility?
    They enforce budgets; allow burst via PR with owner approval.

  8. Can we eliminate limits?
    Keep sane limits to avoid noisy neighbors and OOM risk.

  9. Do we need OpenCost?
    Any allocation tool is fine; act on insights monthly.

  10. Final: measure, rightsize, automate, and review.


Appendix AR — Queue-Based Autoscaling (KEDA)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef: { name: worker }
  pollingInterval: 5
  cooldownPeriod: 60
  triggers:
    - type: aws-sqs-queue
      metadata: { queueURL: https://sqs..., queueLength: "200" }
- Scale on backlogs: SQS, Kafka lag, Redis lists, HTTP queue depth
- Protect backends: max replicas; rate limits; shed load on saturation

Appendix AS — Unit Economics

- Define unit (req, order, job, minute of stream)
- Track $/unit over time; attribute to services via allocation
- Tie goals to product SLAs and margins; review quarterly

Appendix AT — Pod QoS and Overcommit

- QoS: Guaranteed (req=limit), Burstable (req<limit), BestEffort (none)
- Overcommit CPU safely; avoid memory overcommit on stateful
- Node-level overcommit targets: CPU 150–250% for Burstable; Memory near 100%

Appendix AU — Evictions and OOM Tuning

- Eviction thresholds: memory.available, nodefs.available; tune for headroom
- Prefer cgroup OOM kill in low-priority workloads; PDB to protect HA
- Alert on eviction spikes; adjust requests or binpacking

Appendix AV — Multi-Tenant Fairness

- Quotas per namespace; priority classes; fairness in ingress/egress
- Weighted fair sharing for batch queues; protect critical tenants

Appendix AW — Anomaly Detection

- Daily seasonality models for cost signals; alert on >3σ deviations
- Segment by namespace/team; annotate deploys to explain shifts

Appendix AX — Budgeting and Alerts

- Monthly budgets per team; 70/85/100% alerts
- Freeze risky changes on burn >2×; require owner approval to proceed

Appendix AY — Image Distribution

- Regional caches; pre-pull on new node types; deduplicate layers
- Track image size budgets; PR checks for images > 300MB

Appendix AZ — Prewarming and Scale-To-Zero

- Keep N small warm pods for latency SLO; scale rest to zero for batch
- Use warm pools for nodes to cut cold-start penalties

Appendix BA — Cost-Aware Routing

- Send non-latency-critical traffic to cheaper regions; respect data residency
- Prefer server-side caching to reduce egress and CPU

Appendix BB — Data Locality

- Co-locate services with databases; avoid cross-zone chatter
- Cache read-heavy flows near app pods; invalidate on writes

Appendix BC — JVM and Runtime Tuning

- Right-size heap; GC profiles for latency vs throughput
- Avoid CPU throttling by aligning limits to observed peaks

Appendix BD — Language-Specific Tips

- Node.js: cluster workers, UV_THREADPOOL_SIZE, avoid blocking I/O
- Go: GOMAXPROCS tune; pprof hotspots; avoid large allocs
- Python: uvloop, asyncio, gunicorn workers; C extensions for hotspots

Appendix BE — CDN and Edge Offload

- Cache static and semi-static; signed URLs; compress; image transforms at edge

Appendix BF — Data Pipelines

- Batch windows; compact files; combine small writes; choose cheaper storage tiers
- Spot for transform; on-demand for stateful stores

Appendix BG — Rightsizing Workflow (Detailed)

- Discover → Recommend → Review → Rollout → Verify → Lock → Revisit
- Canary reduced requests; monitor throttling and p99 latency

Appendix BH — binpacking Score Tuning

- PreferredDuringScheduling for nodeSelector/affinity; tune weights
- Balance cost with HPA freedom; avoid fragmentation by resource shape

Appendix BI — Cost KPIs

- $/unit, $/namespace, $/service, RI/SP coverage, spot %
- Request/usage ratio, node utilization, egress %, storage growth

Appendix BJ — SLAs and Error Budgets

- Freeze risky cost changes when SLO burn > 2×; prioritize reliability

Appendix BK — Platform Overhead Budgets

- Caps for mesh, observability, controllers; report monthly
- Optimize with sampling and ambient modes

Appendix BL — Security vs Cost

- Keep mTLS and policies; optimize implementation; push heavy checks to edge

Appendix BM — ARM and GPU Mix

- Mixed arch pools; schedule tolerant services; keep critical on proven arch
- GPU sharing for inference; binpack with MIG/MPS; monitor memory use

Appendix BN — CI Workload Placement

- Separate cluster or cheaper region; spot; ARM64 builders; caching

Appendix BO — SaaS vs Self-Hosted Tradeoffs

- Compare TCO: infra, ops, feature velocity; move heavy state to managed

Appendix BP — GreenOps

- Carbon-aware scheduling; lower-carbon regions; efficiency metrics

Appendix BQ — Cost Reviews Cadence

- Weekly quick wins; monthly governance; quarterly roadmap rebase

Appendix BR — Education and Ownership

- Teams own their spend; publish guides; office hours; champions program

Appendix BS — Dashboards (JSON Sketch)

{
  "title": "FinOps Overview",
  "panels": [
    {"type":"graph","title":"$/unit (rolling)"},
    {"type":"graph","title":"Node Utilization"},
    {"type":"table","title":"Top Overprovisioned Services"}
  ]
}

Appendix BT — Alert Rules

- alert: OverprovisionedCPU
  expr: (sum(kube_pod_container_resource_requests{resource="cpu"}) - sum(irate(container_cpu_usage_seconds_total[5m]))) / sum(kube_pod_container_resource_requests{resource="cpu"}) > 0.5
  for: 2h
- alert: EgressSpike
  expr: rate(node_network_transmit_bytes_total[5m]) > 5e7
  for: 15m

Mega FAQ (1001–1400)

  1. Is KEDA better than HPA?
    It complements HPA with queue metrics; use both.

  2. What request percentile to choose?
    Start p90 + 10–20% headroom; adjust per SLO.

  3. Does ARM hurt latency?
    Usually not if optimized; measure; keep critical on proven arch initially.

  4. Are larger nodes riskier?
    Larger blast radius per node; mitigate with PDB and spread.

  5. How to track $/unit?
    Allocate cost → divide by business metric; automate pipeline.


Mega FAQ (1401–1600)

  1. Should I run mesh ambient mode?
    If supported and it meets security needs; saves sidecar overhead.

  2. Drop traces to save cost?
    Sample smartly; keep errors; drop low-value.

  3. Is descheduler mandatory?
    Helpful; not required; Karpenter consolidation can cover many cases.

  4. Can I disable logs?
    Reduce verbosity; keep security/audit; TTL cold storage.

  5. Final: measure, rightsize, automate, and revisit.


Appendix CA — API Server and Controller Overhead

- Trim watch cardinality; prefer fewer CRDs or shard by namespace
- Cache informers; reduce resync periods where safe
- Keep controllers lean; defer heavy work to jobs/queues

Appendix CB — Admission Webhooks Performance

- Batch or cache decisions; avoid per-pod high-latency calls
- Fail-open vs fail-closed policy by risk tier; monitor latency

Appendix CC — Cluster Add-ons Budget

- Cap mesh/obs/add-ons to <15–25% of cluster cost
- Consolidate agents; avoid duplicate collectors; sample aggressively

Appendix CD — Namespace Standards

- Mandatory labels: team, service, env, cost-center
- Default LimitRanges; NetworkPolicies; Quotas

Appendix CE — Node Feature Discovery and Placement

- Discover HW features; schedule to best-fit nodes; avoid mismatches

Appendix CF — Build and Artifact Strategy

- Deduplicate artifacts; use build manifests; prune old tags
- TTL caches; content-addressable storage; compress artifacts

Appendix CG — Event-Driven Scale-down Windows

- Define low-traffic windows; more aggressive consolidation during off-peak

Appendix CH — Safe Overcommit Playbook

- Overcommit CPU on Burstable; avoid memory overcommit for stateful
- Monitor throttling and p99 lat; rollback if SLOs regress

Appendix CI — Throttling and p99 Latency

- Correlate CPU throttling with p99; increase limits or optimize code

Appendix CJ — Language/Runtime Efficiency

- Profile hotspots; reduce allocations; async I/O; vectorize where possible

Appendix CK — DB and Cache Cost Tie-in

- Reduce chatty calls; batch; cache near app; measure $/query

Appendix CL — CDN and Image Optimization

- AVIF/WebP; responsive images; cache-control headers; ETag/If-None-Match

Appendix CM — Data Gravity and Residency

- Keep compute near data; avoid cross-region; pin jobs to data nodes

Appendix CN — Security Posture with Low Overhead

- eBPF where possible; selective deep inspection; periodic scans

Appendix CO — Cost-Aware SLOs

- Define p95 targets with budget; explore slight relaxations for big savings

Appendix CP — Green Schedules

- Shift batch to low-carbon windows/regions; track grams CO2e/unit

Appendix CQ — Platform Change Management

- CAB for major cost-impacting changes; evidence dashboards; rollback plan

Appendix CR — Drift Detection for Costs

- Policy to flag replicas, requests, storage, and egress drifts

Appendix CS — Reserved Capacity Planner

- Estimate baseline; simulate coverage; track unused reservations

Appendix CT — ARM64 Migration Runbook

- Build multi-arch; conformance tests; perf benchmarks; phased rollout

Appendix CU — GPU Scheduling Policies

- Queue priorities; preemption; MIG profiles; binpack with memory headroom

Appendix CV — ML Serving Optimizations

- Model quantization; batch small requests; CPU offload; token limits

Appendix CW — ETL and Lakehouse Costs

- File size targets; compaction; partition pruning; z-ordering

Appendix CX — StatefulSets and PVC Compaction

- Defragment PVC usage; reclaim; shrink volumes; switch to shared where viable

Appendix CY — Multi-Cluster Control Plane Spend

- Consider managed control planes; aggregate small clusters; fleets

Appendix CZ — Final Cost Principles

- Measure → Right-size → Automate → Govern → Repeat

Dashboards (Extended JSON Sketch)

{
  "title": "K8s FinOps Deep Dive",
  "panels": [
    {"type":"graph","title":"CPU Throttling vs p99"},
    {"type":"graph","title":"Overcommit Ratio by Namespace"},
    {"type":"table","title":"Top Egress Services"},
    {"type":"table","title":"Storage Growth by PVC"}
  ]
}

Policies (Examples)

# OPA/Kyverno pseudo: forbid no-requests, forbid :latest, enforce labels

Runbooks (More)

- Planner: RI underutilized → adjust coverage; swap families; sell back (where possible)
- Spot instability → diversify types/zones; lower target; fallback policy
- Mesh overhead → ambient mode; reduce mTLS cost; tune retries/timeouts

Mega FAQ (1601–2000)

  1. Best single lever for savings?
    Rightsizing requests + consolidation.

  2. Why p95 not average?
    Tail latency drives user experience; averages mask spikes.

  3. Can we pre-warm nodes?
    Yes via warm pools; balance against idle waste.

  4. Does ARM help IO-bound?
    Less; focus on CPU-bound workloads for big wins.

  5. How to price $/unit?
    Allocate infra + platform overhead; divide by business units; include egress/storage.


Mega FAQ (2001–2200)

  1. Is single large cluster cheaper?
    Often, but fault domains and noisy neighbors risk; consider multi-cluster.

  2. Should we run everything on spot?
    No; critical paths on on-demand with spot for elastic/batch.

  3. How often to review?
    Weekly quick wins; monthly governance; quarterly roadmap.

  4. Final: cost is a product—own it with data and discipline.

Related posts