Most enterprises are overspending on cloud infrastructure by 30–60%. Not because the cloud is inherently expensive, but because the decisions that drive cloud costs — instance sizing, storage tiers, data transfer patterns, architectural topology — were made during initial deployment and never revisited.
Over the past two years, our platform engineering team has conducted deep cost optimization engagements for clients ranging from Series B startups to Fortune 500 enterprises. The results are remarkably consistent: a 40–65% reduction in monthly cloud spend without performance degradation, achieved through a systematic methodology rather than one-off fixes.
This article details the exact strategies, patterns, and tools we use. No theory, no vendor marketing — just the engineering work that moves the needle on real infrastructure bills.
Phase 1: The Infrastructure Cost Audit
Every engagement starts with a comprehensive audit. We connect to the client’s AWS Cost Explorer, GCP Billing, or Azure Cost Management APIs and pull 90 days of granular billing data. Then we correlate that data with actual utilization metrics from CloudWatch, Stackdriver, or Prometheus. The gap between what you are paying for and what you are actually using is where the savings live.
The most common finding is compute over-provisioning. Teams select instance sizes during initial deployment based on peak load estimates that are often wildly optimistic. Six months later, the application is running at 15% average CPU utilization on instances sized for a peak that never materialized. This pattern alone accounts for 20–30% of wasted spend in a typical engagement.
# Quick audit: find over-provisioned EC2 instances
# Instances with <20% avg CPU over 14 days
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--period 86400 \
--statistics Average \
--start-time $(date -d '14 days ago' --iso-8601) \
--end-time $(date --iso-8601) \
--dimensions Name=InstanceId,Value=i-0abc123def456
# Cross-reference with instance cost
aws ce get-cost-and-usage \
--time-period Start=2026-01-01,End=2026-02-01 \
--granularity MONTHLY \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon EC2"]}}' \
--metrics BlendedCostThe second most common finding is storage accumulation. EBS volumes, S3 buckets, and RDS snapshots grow monotonically because nobody implements lifecycle policies. We routinely find terabytes of obsolete snapshots, orphaned volumes from terminated instances, and S3 buckets full of logs that nobody will ever read. Cleaning up storage waste is boring work, but it often saves $5,000–$20,000 per month.
Phase 2: Right-Sizing Compute
Right-sizing is the process of matching instance types and sizes to actual workload requirements. It sounds simple, but in practice it requires careful analysis of CPU, memory, network, and disk I/O patterns across different time windows. An instance that looks idle on a daily average might spike to 90% CPU for 30 minutes every morning during batch processing.
Our methodology: collect P50, P95, and P99 utilization metrics for every dimension (CPU, memory, network, IOPS) over a 30-day window. Then select the smallest instance family and size that provides headroom above P99 for the binding resource. If an application is memory-bound, switch from a general-purpose m-class to a memory-optimized r-class at a smaller size. If it is compute-bound but bursty, consider a t-class with unlimited credits rather than a fixed-performance c-class.
For Kubernetes workloads, right-sizing happens at two levels: the pod resource requests and limits, and the underlying node instance types. We use vertical pod autoscaler (VPA) in recommend mode to generate right-sized resource requests, then configure cluster autoscaler with a diverse instance type pool to bin-pack pods efficiently. The combination of accurate pod sizing and efficient node packing typically saves 30–40% on Kubernetes compute costs.
# Kubernetes VPA recommendation example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommend only, don't auto-apply
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4GiA critical caveat: never right-size and change purchasing models simultaneously. Right-size first with on-demand instances, validate the new sizing under production load for at least two weeks, then layer on reserved capacity or savings plans. Doing both at once makes it impossible to diagnose performance issues if they arise.
Phase 3: Spot Instances and Reserved Capacity
Once workloads are right-sized, the next lever is purchasing model optimization. On-demand pricing is the most expensive way to consume cloud compute. The same instance costs 30–40% less with a 1-year reservation and 50–60% less with a 3-year commitment. Spot instances offer 70–90% savings for workloads that can tolerate interruption.
Our approach is to categorize every workload into three tiers. The baseline tier — workloads that run 24/7 with predictable demand — gets reserved instances or compute savings plans. The elastic tier — workloads with variable demand like web servers behind an autoscaler — uses a mix of reserved capacity for the floor and spot instances for the headroom. The batch tier — data pipelines, CI/CD, ML training — runs entirely on spot instances with graceful interruption handling.
Spot instances deserve special attention because they offer the largest savings but require architectural preparation. Your application must handle termination with a 2-minute warning gracefully. We implement this with a combination of instance metadata polling, SIGTERM handlers, and connection draining. For Kubernetes, Karpenter handles spot lifecycle automatically, launching replacement nodes from diversified instance pools to minimize interruption frequency.
One client running a large data processing platform moved 80% of their batch workloads to spot instances and reduced that portion of their compute bill from $47,000 to $8,200 per month. The total engineering investment was three weeks of work — adding graceful shutdown handlers and configuring Karpenter provisioners with diversified instance pools.
Phase 4: Architectural Cost Optimization
The deepest savings come from architectural changes that fundamentally reduce the amount of infrastructure a workload requires. These take longer to implement but often yield the most dramatic results.
Move Compute to the Edge
A significant portion of compute spend in many applications is serving content that does not need to be dynamically rendered. Migrating API responses to edge caching via CloudFront or Cloudflare Workers, pre-rendering pages at build time, and pushing computation to the client where appropriate can reduce origin server load by 60–80%. One e-commerce client reduced their API server fleet from 24 instances to 6 by implementing aggressive edge caching with stale-while-revalidate semantics.
Replace Always-On with Event-Driven
Many workloads run continuously but only do meaningful work intermittently. Cron jobs, webhook processors, notification systems, and report generators are classic candidates for serverless migration. A Lambda function or Cloud Run service that scales to zero when idle costs nothing when there is no work to do. We moved a client’s entire notification pipeline from a fleet of always-on ECS tasks to Lambda behind SQS and reduced that system’s cost from $3,400/month to $180/month.
Optimize Data Transfer
Data transfer costs are the hidden killer in many cloud bills. Cross-AZ traffic, NAT Gateway charges, and egress fees add up quickly in distributed architectures. We use VPC endpoints for AWS service traffic to eliminate NAT Gateway costs, co-locate tightly coupled services in the same availability zone, implement response compression at every layer, and use S3 Transfer Acceleration for large object uploads. These changes typically save $2,000–$10,000/month depending on traffic volume.
Phase 5: Sustaining the Savings
One-time cost optimization is a project. Sustained cost efficiency is a practice. Without ongoing governance, cloud costs creep back to their previous levels within 6–12 months as new services are deployed, teams forget the optimizations, and usage patterns change.
We implement three mechanisms for sustained savings. First, automated cost anomaly alerts that fire when any service or team exceeds its trailing 30-day average by more than 20%. Second, infrastructure-as-code policies that enforce tagging, instance size limits, and storage lifecycle rules at deployment time — you cannot deploy an untagged resource or a bare m5.4xlarge without explicit approval. Third, monthly cost reviews where engineering and finance jointly examine the bill, identify new optimization opportunities, and track the savings pipeline.
The most effective organizations embed cost awareness into engineering culture. Every pull request that adds infrastructure shows the estimated monthly cost impact in the PR description. Every sprint review includes a cloud cost dashboard. Every architecture decision document includes a cost section. When engineers think about cost as naturally as they think about latency and availability, the savings sustain themselves.
The Bottom Line
A 60% reduction in cloud spend is not a theoretical ceiling — it is the median result across our engagements. The savings come from compounding multiple strategies: 25% from right-sizing, 20% from purchasing model optimization, 15% from architectural changes, and 10–15% from eliminating waste. None of these are individually revolutionary, but applied systematically, they transform cloud infrastructure from a growing cost center into a well-optimized operational asset.
At LockedIn Labs, we approach cloud cost optimization as an engineering discipline, not a procurement exercise. The best savings come from engineers who understand both the application architecture and the pricing models deeply enough to find the intersections where technical decisions drive financial outcomes.