Kubernetes Cost Optimization: FinOps Strategies (2025)
Kubernetes is powerful—and notoriously easy to overspend on. This guide provides practical controls to cut costs 20–50% while maintaining reliability.
Executive summary
- Rightsize requests/limits; enforce via policies
- Use HPA/VPA where appropriate; treat limits as SLOs
- Prefer spot/preemptible for stateless; bin-pack with topology spread
- Track unit economics: $/req, $/tenant, $/job
Requests/limits hygiene
apiVersion: v1
kind: Pod
spec:
containers:
- name: api
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
- Establish default requests; deny pods without requests; monitor throttling/oomkill
Autoscaling
- HPA: CPU/Memory + custom metrics (QPS, latency)
- VPA: recommendation mode first; avoid fighting HPA
- Cluster Autoscaler: scale nodes based on pending pods
Spot/preemptible
- Use PDBs, checkpoints, idempotent jobs; multiple spot pools; surge buffer
Bin-packing and topology
- Node sizing; taints/tolerations; topologySpreadConstraints; overprovision pods for spikes
FinOps dashboards
- Kubecost + Prometheus; cost allocation by namespace/label; anomaly alerts
Policy enforcement
- Gatekeeper/OPA: deny no‑request pods; enforce limits; restrict instance types
FAQ
Q: HPA vs VPA?
A: HPA for horizontal scaling; VPA for rightsizing. Avoid conflicting signals by running VPA in recommendation mode with HPA.
Related posts
- Terraform Best Practices: /blog/terraform-best-practices-infrastructure-as-code-2025
- OpenTelemetry Guide: /blog/observability-opentelemetry-complete-implementation-guide
- Platform Engineering: /blog/platform-engineering-internal-developer-platforms-2025
- GitOps Strategies: /blog/gitops-argocd-flux-kubernetes-deployment-strategies
- Service Mesh Comparison: /blog/service-mesh-istio-linkerd-comparison-guide-2025
Call to action
Want a K8s cost review and savings plan? Get a free FinOps consult.
Contact: /contact • Newsletter: /newsletter
Kubernetes Cost Optimization: FinOps Strategies (2025)
A pragmatic guide to reduce Kubernetes spend without sacrificing reliability.
1) Objectives and Guardrails
- Reduce $/unit (req, order, job) while meeting SLOs
- Rightsize continuously; prevent regressions with policy and CI gates
- Observe costs per team/workload; align budgets with ownership
2) Cost Drivers (High Level)
- Compute: requests/limits, binpacking, idle capacity, overprovisioning
- Storage: class (gp2/gp3/premium), IOPS, snapshots, orphan PVCs
- Network: cross-AZ/region egress, NAT, LBs, service mesh overhead
- Platform overhead: control plane, observability stack, service mesh
3) Cluster Autoscaler (CA) Tuning
# CA flags (sketch)
- --balance-similar-node-groups=true
- --expander=least-waste
- --scale-down-utilization-threshold=0.5
- --scale-down-unneeded-time=10m
- Separate node groups by workload type (general, memory, CPU, GPU)
- Prefer fewer larger nodes for binpacking unless pod anti-affinity requires spread
4) HPA and Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 65 } }
- type: Pods
pods:
metric: { name: rq_per_pod }
target: { type: AverageValue, averageValue: "40" }
- Use RPS/queue depth as primary signals; CPU-only leads to waste
5) Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
spec:
updatePolicy: { updateMode: "Off" } # recommend-first
- Run in recommend-only to guide requests updates; avoid flapping
6) Karpenter for Fast Binpacking
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
spec:
consolidation: { enabled: true }
requirements:
- key: kubernetes.io/arch
operator: In
values: [amd64, arm64]
- Enables rapid scale and consolidation; pair with budgets and taints
7) Spot/Preemptible and Mixed Pools
- Use spot for stateless, retryable, non-prod jobs
- Mix on-demand and spot with PDBs and priorities
- Budget-aware fallbacks to on-demand during spot scarcity
8) Reservations and Savings Plans
- Baseline steady load with RIs/SPs; leave burst to on-demand/spot
- Right-size terms (1y vs 3y), regional vs zonal, convertible RIs for flexibility
9) Requests, Limits, and Quotas
apiVersion: v1
kind: ResourceQuota
spec:
hard:
requests.cpu: "200"
requests.memory: 400Gi
limits.cpu: "400"
limits.memory: 800Gi
- Enforce limitRange defaults; forbid pods without requests/limits
10) Policy-as-Code (OPA/Kyverno)
# Kyverno: require requests/limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
spec:
validationFailureAction: enforce
rules:
- name: require-requests-limits
match: { resources: { kinds: ["Pod","Deployment","StatefulSet"] } }
validate:
message: "Requests and limits required"
pattern:
spec:
containers:
- resources:
requests: { cpu: "?*", memory: "?*" }
limits: { cpu: "?*", memory: "?*" }
11) Storage Strategies
- Prefer gp3/premiumV2 with tuned IOPS; avoid default gp2
- Use Retain only where needed; clean orphan PVCs and snapshots
- Move logs/temp to ephemeral; compress and tier to cheap storage
12) Network and Egress Controls
- Keep traffic in-zone/region; use private endpoints; reduce NAT
- Collapse noisy sidecars; prefer node-local caching
- CDN offload for egress-heavy paths
13) Service Mesh Overhead
- Sidecarless/ambient when feasible; reduce mTLS costs with HW offload
- Sample traces; reduce metrics cardinality; cap retries
14) Image and Registry Costs
- Slim base images; multi-arch; layer reuse; registry caching
- Avoid cross-region pulls; colocate registry with clusters
15) GPU Workloads
- Fractional GPUs (MIG/MPS) where supported; binpack DL jobs
- Spot GPUs for training; on-demand for inference SLOs
16) Scheduling and Affinity
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector: { matchLabels: { app: web } }
- Taints for spot; tolerations for opt-in workloads; priorities for eviction
17) Descheduler and Consolidation
apiVersion: descheduler/v1alpha1
kind: DeschedulerPolicy
profiles:
- name: balance
strategies:
RemoveDuplicates: { enabled: true }
LowNodeUtilization: { enabled: true }
- Periodically rebalance to free nodes for scale-down
18) Observability for Cost
# CPU request vs usage
sum(kube_pod_container_resource_requests{resource="cpu"})
/ sum(rate(container_cpu_usage_seconds_total[5m]))
# Idle nodes
sum(kube_node_status_capacity_cpu_cores) - sum(kube_node_status_allocatable_cpu_cores)
- Track $ per unit: cost per request/order/session/build
19) Allocation, Showback, Chargeback
- Label/annotate: team, service, env, cost-center
- Cost tools (OpenCost/Kubecost) map cloud bills to k8s resources
- Monthly reports; budgets with alerts; corrective actions
20) FinOps Workflow
- Detect: dashboards, alerts, anomaly detection
- Diagnose: who/what/why; unit cost changed?
- Decide: trade-offs with SLOs; approval matrix
- Deliver: PRs, policy updates, autoscaler tuning
- Document: runbooks, owners, SLAs
21) CI/CD Cost Controls
- Parallelism caps; ephemeral preview TTLs; cache artifacts; small runners
- Build on cheaper instance types; ARM64 where supported
22) Runbooks
- Cost Spike: check deploys, HPA settings, traffic, egress, storage snapshots
- Idle Spend: identify idle nodes, orphan PVC/LBs, zombie namespaces
- Spot Flaps: increase PDB, diversify instance types, fallback policy
23) Dashboards (Sketch JSON)
{
"title": "Kubernetes Cost Overview",
"panels": [
{"type":"stat","title":"$/1000 req"},
{"type":"graph","title":"CPU Req vs Usage"},
{"type":"table","title":"Top Namespaces by Cost"}
]
}
24) FAQ (1–200)
-
How much headroom is healthy?
Target 20–30% for burst; lower with fast scaling. -
Should I remove limits?
Keep limits to prevent noisy neighbors; set sane ceilings. -
Is spot safe for prod?
Yes for stateless and resilient services; protect critical paths. -
Do I need Kubecost?
Use OpenCost or cloud-native + labels; pick one and act on the data.
25) Workload Profiles and Sizing
- Latency-sensitive web: low CPU request, modest burst, HPA on RPS + CPU
- Batch/ETL: high CPU/mem, tolerant to preemption; schedule nightly windows
- Streaming: steady CPU, strict p99; avoid spot if stateful; tune JVM
- ML inference: GPU/CPU bound; autoscale on QPS and queue depth
- Build CI: short-lived, use cheap nodes; aggressive binpacking
26) Node Pools and Instance Selection
- Separate pools: general, memory-optimized, compute-optimized, GPU, ARM64
- Prefer latest-gen instances with better perf/$ (e.g., C7g vs C5)
- Use fewer, larger nodes for binpacking (watch PDB and max pods per node)
- Cap max pods per node to avoid ENI/IP exhaustion
27) ARM64 (Graviton/Neoverse) Strategy
- 20–40% perf/$ gains typical; rebuild images multi-arch; test perf
- Pin critical workloads to proven arch; dual-arch pools during migration
- Cache warming: pre-pull images; registry in-region
28) Requests Right-Sizing Method
- Collect 7–14 days of usage per container at stable traffic
- Pick p90–p95 usage as request; add 10–20% headroom; set limits 1.5–2×
- Revisit quarterly or after major releases; automate via recommender
29) Requests Estimator (Pseudo)
function recommendRequests(samples: number[]): { req: number; limit: number } {
const sorted = [...samples].sort((a,b)=>a-b);
const p = (k:number)=> sorted[Math.min(sorted.length-1, Math.floor(k*sorted.length))];
const req = p(0.9) * 1.1; // 10% headroom
const limit = req * 1.8;
return { req, limit };
}
30) HPA + VPA Interplay
- Avoid VPA live-updating with HPA; use VPA recommend-only for HPA targets
- Tune stabilization windows; prevent oscillations with target tracking
31) Karpenter Consolidation Tuning
spec:
consolidation: { enabled: true }
limits:
resources:
cpu: "2000"
memory: 4Ti
providerRef: { name: default }
- Use consolidation to replace underutilized nodes with denser placements
- Protect critical pods with PDB and priorities
32) Spot Best Practices
- Diversify 3–5 instance types per pool; use capacity-optimized allocation
- PDB minAvailable >= 1 for HA; checkpoint batch jobs
- Fallback to on-demand when interruption rates spike
33) Savings Plans / RIs Strategy
- Cover 60–80% baseline with SP/RIs; convertibles for flexibility
- Model demand across regions; avoid over-commit; monitor coverage/unused
34) Namespace Budgets and Quotas
apiVersion: v1
kind: ResourceQuota
metadata: { name: team-a-quota, namespace: team-a }
spec:
hard:
requests.cpu: "400"
requests.memory: 800Gi
pods: "600"
- Review monthly; alert at 80/90/100%; support burst approvals via PR
35) Policy Pack (Kyverno)
# Disallow :latest images, forbid NoLimits, require topologySpread
36) Storage Optimization (Extended)
- gp3 with tuned IOPS for DB; gp2 only for legacy; cold snapshots lifecycle
- Delete Completed PVCs for Jobs; set retention policies; prune orphans
- Compress logs; move to object storage; TTL temp volumes
37) Network/Egress Patterns
- PrivateLinks/Service Endpoints; VPC endpoints for S3/Blob/GCR
- NAT instance sizing; reduce east-west via locality; co-locate DBs
38) Telemetry Slimming
- Metrics: drop high-cardinality labels; 1–5m scrape; histograms limited
- Traces: sample 1–5%; tail-based for errors; drop internal noisy paths
- Logs: structure; drop debug in prod; TTL cold storage
39) Image Slimming
- Distroless; multi-stage builds; prune dev deps; compress layers
- Registry cache in-cluster; avoid cross-region pulls
40) GPU Queues and Binpacking
- Use queues for DL jobs; pack multi-GPU nodes; preemption for low-priority
- MIG for sharing; pin inference to small GPUs; autoscale per queue depth
41) CronJobs Windows and Batching
spec:
schedule: "*/10 * * * *"
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
concurrencyPolicy: Forbid
- Run batch at off-peak; aggregate small jobs; prefer queues for spikes
42) Anti-Affinity and Spreading
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- Balance resiliency with binpacking; avoid excessive spreading that blocks CA
43) Descheduler Profiles
profiles:
- name: costs
strategies:
LowNodeUtilization: { enabled: true }
RemoveDuplicates: { enabled: true }
44) OpenCost/Kubecost Setup
- Install with cloud billing export; labels for team/service/env
- Verify allocation aligns with budgets; reconcile monthly
45) Dashboards and Alerts
{
"title": "Cost Signals",
"panels": [
{"type":"graph","title":"CPU Req vs Usage"},
{"type":"graph","title":"Node Utilization"},
{"type":"table","title":"Top Workloads by Idle $"}
]
}
# Alerts: idle nodes > N, request/usage ratio > X, egress spike
46) Runbooks (Extended)
Spike in $/req
- Check deploys, HPA/VPA, traffic mix, cache hit, egress, storage IOPS
Idle node surge
- Descheduler; consolidate; CA scale down limits; blocked PDBs
Egress anomaly
- Identify destination; switch to private links; CDN offload; caches
47) Case Studies (Brief)
- ARM64 migration: 28% perf/$ gain; image rebuild; mixed pools rollout
- Spot adoption: 45% savings on CI/batch; PDB tuning; fallbacks
- Storage cleanup: 30% cut via orphan PVC prune and snapshot TTL
48) Infra Reservations via IaC (Sketch)
# Terraform Savings Plans example (pseudo)
resource "aws_savingsplans_plan" "compute" {
commitment = "10"
term = 1
payment_option = "NO_UPFRONT"
}
49) PromQL Library
# Requests/Usage ratio by namespace
sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
/ clamp_min(sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace), 0.001)
# Node idle CPU cores
sum(kube_node_status_allocatable_cpu_cores) - sum(irate(container_cpu_usage_seconds_total[5m]))
50) FAQ (201–600)
-
How low can requests go?
Enough to avoid throttling; prefer target tracking HPA to handle spikes. -
Do I disable limits to improve binpacking?
Keep sensible limits; avoid unlimited memory which risks OOM killer. -
Is descheduler safe?
Use conservatively; off-hours; monitor disruption and PDB respect. -
Can I move observability off-cluster?
Yes—managed backends cut node/IO costs; weigh egress. -
Are DaemonSets expensive?
Yes per-node overhead; consolidate and trim exports.
JSON-LD
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Kubernetes Cost Optimization: FinOps Strategies (2025)",
"description": "Comprehensive guide to reducing Kubernetes spend: autoscaling, binpacking, spot, storage/network, OpenCost, and runbooks.",
"datePublished": "2025-10-28",
"dateModified": "2025-10-28",
"author": {"@type":"Person","name":"Elysiate"}
}
</script>
Related Posts
- GitOps: Argo CD and Flux Deployment Strategies
- Observability with OpenTelemetry: Complete Implementation Guide
CTA
Need to cut Kubernetes spend without risk? We implement autoscaling, binpacking, and showback with strong guardrails.
Appendix AA — Kubelet and Node OS Tuning
- Cgroups v2; proper CPU manager policy (static/none) by workload
- Eviction thresholds tuned (memory.available, nodefs.available)
- image-gc-high-threshold, image-gc-low-threshold to reduce churn
- ReadOnlyRootFilesystem where possible; tmpfs for temp
Appendix AB — Scheduler and Priorities
- priorityClass for critical vs best-effort workloads
- preemptionPolicy on low-priority jobs; ensure PDB compliance
- Binpacking scoring plugins: balance cost vs spread
Appendix AC — Pod Disruption Budgets (PDB)
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 1
- Right-size PDBs to allow scale-down; avoid blocking consolidation
Appendix AD — Consolidation Strategies
- Descheduler + Karpenter consolidation
- Controlled drain windows; respect workload SLOs
Appendix AE — Multi-Cluster Cost Patterns
- Split prod vs batch clusters; separate noisy tenants
- Regional clusters to reduce cross-region egress
- Centralized shared services (CI, registry) sized appropriately
Appendix AF — AWS-Specific Tips
- Use Graviton (ARM64) where possible (C7g/M7g/R7g)
- gp3 with tuned IOPS; EBS volume attachment limits planning
- PrivateLink for SaaS; save NAT costs; S3 Gateway endpoints
- Savings Plans coverage monitoring (Compute SP)
Appendix AG — GCP-Specific Tips
- Preemptible VMs for batch; N2D (AMD) for perf/$
- Filestore tiers; Cloud NAT per-subnet tuning; VPC-SC for egress control
- Committed use discounts; regionalize GCR/Artifact Registry
Appendix AH — Azure-Specific Tips
- Dv5/Ev5 for perf/$; Premium SSD v2 tuning
- Private Endpoints; Azure NAT Gateway sizing; Savings Plans/Reserved
- Container Registry geo-replication in-region
Appendix AI — NAT/Egress Cost Controls
- Minimize NAT with VPC endpoints; collapse egress via proxies/CDN
- Audit egress-heavy services; cache and co-locate backends
Appendix AJ — Observability Stack Costs
- Metrics: limit cardinality; remote-write to managed stores
- Logs: sampling and TTL; structure; centralize parsing
- Traces: head/tail sampling; route only errors to persistent storage
Appendix AK — Showback/Chargeback
- Labels: team, service, env, cost-center mandatory
- Allocation via OpenCost; monthly reviews with owners
- Budget thresholds with alerts; corrective SLAs
Appendix AL — CI/CD Costs
- Reuse caches; keep runners small; ARM64 builders; limit parallelism
- Build avoidance, test selection, artifact TTLs
Appendix AM — Policy Pack (Kyverno/OPA)
# Require requests/limits; disallow hostNetwork; forbid :latest; enforce spread
Appendix AN — FinOps Governance
- Monthly savings reviews; anomaly response; budget vs actuals
- Owner accountability; executive dashboards; unit-cost goals
Appendix AO — Performance Engineering Tie-In
- Improve cache hit; reduce cold starts; faster startup reduces overprovision
- Profile hotspots; lower CPU per request; smaller images
Appendix AP — Dashboards (PromQL Sketch)
# $/req proxy metric = infra_cost_per_min / req_per_min
# CPU overrequest ratio
sum(kube_pod_container_resource_requests{resource="cpu"}) / sum(irate(container_cpu_usage_seconds_total[5m]))
# Node utilization
sum(irate(container_cpu_usage_seconds_total[5m])) / sum(kube_node_status_allocatable_cpu_cores)
Appendix AQ — Runbooks (Quick)
- CA not scaling down: PDB blocking, DaemonSet density, min nodes set
- High egress: identify dests; add endpoints; compress; CDN
- Storage spike: snapshots growth; orphan PVC; logs in volumes
Mega FAQ (601–1000)
-
Best HPA signal?
Business metrics (RPS/queue); CPU as secondary. -
VPA live updates safe?
Generally not with HPA; recommend-only for sizing. -
Are bigger nodes always cheaper?
Often, but watch max pods/IPs and failure blast radius. -
Spot for stateful?
Avoid; if used, replicate and checkpoint aggressively. -
How to curb observability cost?
Cardinality controls, sampling, TTL, managed backends. -
Is ARM migration worth it?
Usually 20–40% perf/$; test first. -
Do quotas hinder agility?
They enforce budgets; allow burst via PR with owner approval. -
Can we eliminate limits?
Keep sane limits to avoid noisy neighbors and OOM risk. -
Do we need OpenCost?
Any allocation tool is fine; act on insights monthly. -
Final: measure, rightsize, automate, and review.
Appendix AR — Queue-Based Autoscaling (KEDA)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
scaleTargetRef: { name: worker }
pollingInterval: 5
cooldownPeriod: 60
triggers:
- type: aws-sqs-queue
metadata: { queueURL: https://sqs..., queueLength: "200" }
- Scale on backlogs: SQS, Kafka lag, Redis lists, HTTP queue depth
- Protect backends: max replicas; rate limits; shed load on saturation
Appendix AS — Unit Economics
- Define unit (req, order, job, minute of stream)
- Track $/unit over time; attribute to services via allocation
- Tie goals to product SLAs and margins; review quarterly
Appendix AT — Pod QoS and Overcommit
- QoS: Guaranteed (req=limit), Burstable (req<limit), BestEffort (none)
- Overcommit CPU safely; avoid memory overcommit on stateful
- Node-level overcommit targets: CPU 150–250% for Burstable; Memory near 100%
Appendix AU — Evictions and OOM Tuning
- Eviction thresholds: memory.available, nodefs.available; tune for headroom
- Prefer cgroup OOM kill in low-priority workloads; PDB to protect HA
- Alert on eviction spikes; adjust requests or binpacking
Appendix AV — Multi-Tenant Fairness
- Quotas per namespace; priority classes; fairness in ingress/egress
- Weighted fair sharing for batch queues; protect critical tenants
Appendix AW — Anomaly Detection
- Daily seasonality models for cost signals; alert on >3σ deviations
- Segment by namespace/team; annotate deploys to explain shifts
Appendix AX — Budgeting and Alerts
- Monthly budgets per team; 70/85/100% alerts
- Freeze risky changes on burn >2×; require owner approval to proceed
Appendix AY — Image Distribution
- Regional caches; pre-pull on new node types; deduplicate layers
- Track image size budgets; PR checks for images > 300MB
Appendix AZ — Prewarming and Scale-To-Zero
- Keep N small warm pods for latency SLO; scale rest to zero for batch
- Use warm pools for nodes to cut cold-start penalties
Appendix BA — Cost-Aware Routing
- Send non-latency-critical traffic to cheaper regions; respect data residency
- Prefer server-side caching to reduce egress and CPU
Appendix BB — Data Locality
- Co-locate services with databases; avoid cross-zone chatter
- Cache read-heavy flows near app pods; invalidate on writes
Appendix BC — JVM and Runtime Tuning
- Right-size heap; GC profiles for latency vs throughput
- Avoid CPU throttling by aligning limits to observed peaks
Appendix BD — Language-Specific Tips
- Node.js: cluster workers, UV_THREADPOOL_SIZE, avoid blocking I/O
- Go: GOMAXPROCS tune; pprof hotspots; avoid large allocs
- Python: uvloop, asyncio, gunicorn workers; C extensions for hotspots
Appendix BE — CDN and Edge Offload
- Cache static and semi-static; signed URLs; compress; image transforms at edge
Appendix BF — Data Pipelines
- Batch windows; compact files; combine small writes; choose cheaper storage tiers
- Spot for transform; on-demand for stateful stores
Appendix BG — Rightsizing Workflow (Detailed)
- Discover → Recommend → Review → Rollout → Verify → Lock → Revisit
- Canary reduced requests; monitor throttling and p99 latency
Appendix BH — binpacking Score Tuning
- PreferredDuringScheduling for nodeSelector/affinity; tune weights
- Balance cost with HPA freedom; avoid fragmentation by resource shape
Appendix BI — Cost KPIs
- $/unit, $/namespace, $/service, RI/SP coverage, spot %
- Request/usage ratio, node utilization, egress %, storage growth
Appendix BJ — SLAs and Error Budgets
- Freeze risky cost changes when SLO burn > 2×; prioritize reliability
Appendix BK — Platform Overhead Budgets
- Caps for mesh, observability, controllers; report monthly
- Optimize with sampling and ambient modes
Appendix BL — Security vs Cost
- Keep mTLS and policies; optimize implementation; push heavy checks to edge
Appendix BM — ARM and GPU Mix
- Mixed arch pools; schedule tolerant services; keep critical on proven arch
- GPU sharing for inference; binpack with MIG/MPS; monitor memory use
Appendix BN — CI Workload Placement
- Separate cluster or cheaper region; spot; ARM64 builders; caching
Appendix BO — SaaS vs Self-Hosted Tradeoffs
- Compare TCO: infra, ops, feature velocity; move heavy state to managed
Appendix BP — GreenOps
- Carbon-aware scheduling; lower-carbon regions; efficiency metrics
Appendix BQ — Cost Reviews Cadence
- Weekly quick wins; monthly governance; quarterly roadmap rebase
Appendix BR — Education and Ownership
- Teams own their spend; publish guides; office hours; champions program
Appendix BS — Dashboards (JSON Sketch)
{
"title": "FinOps Overview",
"panels": [
{"type":"graph","title":"$/unit (rolling)"},
{"type":"graph","title":"Node Utilization"},
{"type":"table","title":"Top Overprovisioned Services"}
]
}
Appendix BT — Alert Rules
- alert: OverprovisionedCPU
expr: (sum(kube_pod_container_resource_requests{resource="cpu"}) - sum(irate(container_cpu_usage_seconds_total[5m]))) / sum(kube_pod_container_resource_requests{resource="cpu"}) > 0.5
for: 2h
- alert: EgressSpike
expr: rate(node_network_transmit_bytes_total[5m]) > 5e7
for: 15m
Mega FAQ (1001–1400)
-
Is KEDA better than HPA?
It complements HPA with queue metrics; use both. -
What request percentile to choose?
Start p90 + 10–20% headroom; adjust per SLO. -
Does ARM hurt latency?
Usually not if optimized; measure; keep critical on proven arch initially. -
Are larger nodes riskier?
Larger blast radius per node; mitigate with PDB and spread. -
How to track $/unit?
Allocate cost → divide by business metric; automate pipeline.
Mega FAQ (1401–1600)
-
Should I run mesh ambient mode?
If supported and it meets security needs; saves sidecar overhead. -
Drop traces to save cost?
Sample smartly; keep errors; drop low-value. -
Is descheduler mandatory?
Helpful; not required; Karpenter consolidation can cover many cases. -
Can I disable logs?
Reduce verbosity; keep security/audit; TTL cold storage. -
Final: measure, rightsize, automate, and revisit.
Appendix CA — API Server and Controller Overhead
- Trim watch cardinality; prefer fewer CRDs or shard by namespace
- Cache informers; reduce resync periods where safe
- Keep controllers lean; defer heavy work to jobs/queues
Appendix CB — Admission Webhooks Performance
- Batch or cache decisions; avoid per-pod high-latency calls
- Fail-open vs fail-closed policy by risk tier; monitor latency
Appendix CC — Cluster Add-ons Budget
- Cap mesh/obs/add-ons to <15–25% of cluster cost
- Consolidate agents; avoid duplicate collectors; sample aggressively
Appendix CD — Namespace Standards
- Mandatory labels: team, service, env, cost-center
- Default LimitRanges; NetworkPolicies; Quotas
Appendix CE — Node Feature Discovery and Placement
- Discover HW features; schedule to best-fit nodes; avoid mismatches
Appendix CF — Build and Artifact Strategy
- Deduplicate artifacts; use build manifests; prune old tags
- TTL caches; content-addressable storage; compress artifacts
Appendix CG — Event-Driven Scale-down Windows
- Define low-traffic windows; more aggressive consolidation during off-peak
Appendix CH — Safe Overcommit Playbook
- Overcommit CPU on Burstable; avoid memory overcommit for stateful
- Monitor throttling and p99 lat; rollback if SLOs regress
Appendix CI — Throttling and p99 Latency
- Correlate CPU throttling with p99; increase limits or optimize code
Appendix CJ — Language/Runtime Efficiency
- Profile hotspots; reduce allocations; async I/O; vectorize where possible
Appendix CK — DB and Cache Cost Tie-in
- Reduce chatty calls; batch; cache near app; measure $/query
Appendix CL — CDN and Image Optimization
- AVIF/WebP; responsive images; cache-control headers; ETag/If-None-Match
Appendix CM — Data Gravity and Residency
- Keep compute near data; avoid cross-region; pin jobs to data nodes
Appendix CN — Security Posture with Low Overhead
- eBPF where possible; selective deep inspection; periodic scans
Appendix CO — Cost-Aware SLOs
- Define p95 targets with budget; explore slight relaxations for big savings
Appendix CP — Green Schedules
- Shift batch to low-carbon windows/regions; track grams CO2e/unit
Appendix CQ — Platform Change Management
- CAB for major cost-impacting changes; evidence dashboards; rollback plan
Appendix CR — Drift Detection for Costs
- Policy to flag replicas, requests, storage, and egress drifts
Appendix CS — Reserved Capacity Planner
- Estimate baseline; simulate coverage; track unused reservations
Appendix CT — ARM64 Migration Runbook
- Build multi-arch; conformance tests; perf benchmarks; phased rollout
Appendix CU — GPU Scheduling Policies
- Queue priorities; preemption; MIG profiles; binpack with memory headroom
Appendix CV — ML Serving Optimizations
- Model quantization; batch small requests; CPU offload; token limits
Appendix CW — ETL and Lakehouse Costs
- File size targets; compaction; partition pruning; z-ordering
Appendix CX — StatefulSets and PVC Compaction
- Defragment PVC usage; reclaim; shrink volumes; switch to shared where viable
Appendix CY — Multi-Cluster Control Plane Spend
- Consider managed control planes; aggregate small clusters; fleets
Appendix CZ — Final Cost Principles
- Measure → Right-size → Automate → Govern → Repeat
Dashboards (Extended JSON Sketch)
{
"title": "K8s FinOps Deep Dive",
"panels": [
{"type":"graph","title":"CPU Throttling vs p99"},
{"type":"graph","title":"Overcommit Ratio by Namespace"},
{"type":"table","title":"Top Egress Services"},
{"type":"table","title":"Storage Growth by PVC"}
]
}
Policies (Examples)
# OPA/Kyverno pseudo: forbid no-requests, forbid :latest, enforce labels
Runbooks (More)
- Planner: RI underutilized → adjust coverage; swap families; sell back (where possible)
- Spot instability → diversify types/zones; lower target; fallback policy
- Mesh overhead → ambient mode; reduce mTLS cost; tune retries/timeouts
Mega FAQ (1601–2000)
-
Best single lever for savings?
Rightsizing requests + consolidation. -
Why p95 not average?
Tail latency drives user experience; averages mask spikes. -
Can we pre-warm nodes?
Yes via warm pools; balance against idle waste. -
Does ARM help IO-bound?
Less; focus on CPU-bound workloads for big wins. -
How to price $/unit?
Allocate infra + platform overhead; divide by business units; include egress/storage.
Mega FAQ (2001–2200)
-
Is single large cluster cheaper?
Often, but fault domains and noisy neighbors risk; consider multi-cluster. -
Should we run everything on spot?
No; critical paths on on-demand with spot for elastic/batch. -
How often to review?
Weekly quick wins; monthly governance; quarterly roadmap. -
Final: cost is a product—own it with data and discipline.