Every cloud migration starts with a spreadsheet that proves it will save money. The numbers are clean. The comparison is simple: here's what you pay for on-premises infrastructure today, here's what the cloud will cost. Someone has run the calculator on the AWS or Azure website, applied a modest growth assumption, and produced a figure that's lower than the current run rate. The CFO nods. The board approves. Six months later, the first invoice arrives, and it's nothing like the spreadsheet.
This pattern repeats across organizations of every size. Not because cloud computing is a bad idea (for most workloads, it's the right direction) but because the migration itself is treated as a technology project when it's actually a business transformation. The technical lift is the easy part. The hard part is the operating model change that nobody planned for.
After watching dozens of these migrations unfold, the failure modes are remarkably consistent. Here are the seven that do the most damage.
1. Treating lift-and-shift as a strategy
Lift-and-shift (taking existing workloads and moving them to cloud virtual machines with minimal modification) is a legitimate migration approach. For some workloads, it's the right one. Legacy applications that can't be refactored, systems with complex dependencies that would take months to untangle, or workloads that just need to get out of a data center before a lease expires. In these cases, lift-and-shift is a pragmatic choice.
The problem is when lift-and-shift becomes the default for everything. When the migration plan is essentially "move all our servers to the cloud and figure out the optimization later." This approach guarantees higher costs, because on-premises architecture doesn't map cleanly to cloud pricing models.
A physical server running at 15% average utilization is wasteful on-premises, but the cost is already sunk. That same workload running on a cloud VM is costing you for 100% of the provisioned capacity, every hour. The economics only work if you right-size the instance, implement auto-scaling, or refactor the application to use cloud-native services that scale to zero when idle.
"We'll optimize after the migration" is the most expensive sentence in cloud computing. Once workloads are running in production, the urgency to optimize vanishes. Teams are busy with the next project. The bloated instances become the new normal. Twelve months later, someone realizes the cloud is costing twice what the data center did, and the conversation shifts from optimization to repatriation.
Lift-and-shift works when it's a deliberate choice for specific workloads with a documented plan for subsequent optimization. It fails when it's a shortcut that avoids the harder work of understanding what each workload actually needs.
2. Ignoring egress charges until the first bill
Cloud providers are generous with ingress. Getting data into the cloud is free or nearly free. Getting it out is where the economics change dramatically. AWS charges per gigabyte for data leaving a region. Azure charges per gigabyte for data leaving a region. Google Cloud charges per gigabyte for data leaving a region. The rates vary, but the principle is universal: data gravity is a real force, and the cloud providers have designed their pricing to reinforce it.
For organizations that move data between cloud regions, between cloud and on-premises, or between cloud and end users, egress charges can dwarf compute and storage costs. Video processing pipelines, data analytics platforms that pull data from cloud storage to on-premises tools, backup strategies that replicate to a different region. All of these generate egress that rarely appears in the initial business case.
The fix isn't to avoid egress. It's to model it honestly before the migration. Map your data flows. Understand how much data moves between systems, where it goes, and how often. Then price the egress into the business case. If the numbers still work, proceed. If they don't, redesign the architecture to minimize cross-boundary data movement. Or acknowledge that some workloads are cheaper on-premises.
One pattern that catches organizations repeatedly: hybrid architectures where the cloud is the primary compute platform but on-premises systems still need frequent access to cloud-stored data. Every API call that returns a payload, every database query from an on-premises application to a cloud database, every file download from cloud storage. It all counts as egress. These small transactions accumulate into significant monthly charges.
3. Reserved instance mismanagement
Cloud providers offer significant discounts for commitment. AWS Reserved Instances, Azure Reserved VM Instances, Google Committed Use Discounts. They all follow the same principle: commit to a certain level of usage for one or three years, and the per-unit cost drops substantially. The discounts are real and material, often 30-60% compared to on-demand pricing.
The mistake is in how organizations manage these commitments. There are two failure modes, and they're equally expensive.
Under-commitment: Running everything on-demand because "we need flexibility." This is the cloud equivalent of renting an apartment month-to-month for five years. Flexibility has a price, and for stable, predictable workloads, that price is absurd. If a workload has been running at consistent utilization for six months, it should be reserved. The data is right there in the billing console.
Over-commitment: Buying reserved capacity based on current usage without accounting for optimization, migration, or business change. A three-year reservation for an instance type that gets deprecated eighteen months in. Reserved capacity for a workload that moves to containers six months later. Over-commitment locks you into paying for resources you no longer need, and the early termination options are either nonexistent or punitive.
The right approach is a continuous reservation management process (not a one-time purchasing decision. Review utilization quarterly. Reserve stable workloads for one year initially, extending to three years only for workloads with proven stability. Use savings plans (AWS) or flexible reservations where available to hedge against architecture changes. And assign someone) a FinOps role, a cloud architect, or at minimum the finance team working with IT (to own the reservation portfolio.
)Most organizations do none of this. They either run everything on-demand and overpay by 40%, or they make a bulk reservation purchase at migration time and then forget about it until the commitments expire and the bill spikes.
4. Multi-cloud as a default instead of a decision
Multi-cloud (running workloads across AWS, Azure, and GCP simultaneously) has become an article of faith in enterprise IT. The reasoning sounds solid: avoid vendor lock-in, use best-of-breed services from each provider, maintain negotiating power. In practice, multi-cloud is expensive, complex, and rarely delivers the benefits it promises.
Each cloud provider has its own networking model, identity system, monitoring tools, security controls, and operational patterns. Running workloads across two or more providers means your team needs expertise in all of them. Your monitoring needs to span all of them. Your security policies need to be implemented and maintained in all of them. Your deployment pipelines need to target all of them. The operational overhead is not additive. It's multiplicative.
The lock-in argument deserves scrutiny. Yes, using provider-specific services creates dependency. But the alternative (using only lowest-common-denominator services that work across all clouds) means you're paying cloud prices while deliberately avoiding the services that make the cloud valuable. You end up running VMs and Kubernetes clusters that could run anywhere, paying a premium for infrastructure that doesn't use the platform's strengths.
Multi-cloud makes sense in specific situations: post-acquisition integration where different business units are on different platforms, regulatory requirements that mandate geographic or provider diversity, or genuine best-of-breed requirements where one provider's specific service is materially better for a specific workload. These are deliberate architectural decisions with clear justification.
What doesn't make sense is multi-cloud as a default posture, driven by a vague desire to avoid lock-in. The cost of running two clouds poorly is always higher than the cost of running one cloud well. If vendor diversification is a genuine strategic priority, invest in portable application architectures (containers, infrastructure as code) rather than maintaining parallel cloud estates.
5. The security model change nobody planned for
On-premises security is perimeter-based. You have a firewall. Traffic from inside is relatively trusted. Traffic from outside is filtered. The model is simple, well-understood, and has decades of tooling and expertise behind it.
Cloud security is fundamentally different. There is no perimeter. Every service is accessible via API. Identity is the new perimeter. Network controls exist but they're one layer among many. The attack surface is not the network boundary. It's the configuration of every service, every IAM role, every storage bucket, every API endpoint.
Organizations that migrate to the cloud without rethinking their security model end up with one of two problems. Either they try to recreate the on-premises perimeter in the cloud (VPN tunnels back to the corporate network, all traffic routed through a virtual firewall appliance, cloud treated as a remote data center) which is expensive and slow, or they leave cloud services with default configurations that are far more permissive than their on-premises equivalent.
The misconfigured S3 bucket has become the cliché of cloud security incidents, but the root cause isn't carelessness. It's a mismatch between the security model the team understands (perimeter-based) and the security model the platform requires (identity-based, policy-driven, configuration-as-code). A storage admin who has spent fifteen years managing file servers behind a firewall doesn't instinctively think about bucket policies and access control lists when provisioning cloud storage.
This isn't a training problem that's solved by sending people on a course. It's an organizational change that requires new processes, new tooling, and new roles. Cloud security posture management tools (CSPM), infrastructure-as-code with security policies embedded, automated compliance scanning, identity governance. These aren't optional extras. They're the minimum viable security model for cloud operations.
Budget for this from day one. If the migration business case doesn't include cloud security tooling and the time to implement it, the business case is incomplete.
6. The skills gap you discover in production
Migrating to the cloud doesn't just change where your workloads run. It changes the skills your team needs to manage them. On-premises infrastructure management is physical: servers, switches, cables, cooling, power. Cloud infrastructure management is software: APIs, configuration files, deployment pipelines, monitoring dashboards.
Some of the transition maps neatly. A network engineer who understands routing and switching can learn cloud networking. A sysadmin who understands Linux can manage cloud VMs. But much of cloud operations has no on-premises equivalent. Infrastructure as code. Serverless architectures. Container orchestration. Auto-scaling policies. Cost management. Cloud-native monitoring. These are new disciplines that require dedicated learning and practice.
The skills gap typically reveals itself after the migration, when the team that was focused on "getting to the cloud" now has to operate in the cloud. The migration project had vendor professional services, a system integrator, external consultants. The operational team is your existing IT staff, who now need to manage a platform they've never operated before.
The organizations that handle this well start the skills development before the migration. They invest in certifications, but more importantly in hands-on practice. They build non-production environments where the team can experiment. They pair internal staff with external specialists during the migration so knowledge transfers through doing, not through documentation. And they accept that some roles will change fundamentally. The hardware-focused infrastructure engineer may need to become a cloud platform engineer, which is a career change, not just a skills update.
The organizations that handle this poorly outsource the migration and then wonder why their internal team can't manage the result. Or they hire cloud engineers at market rate and then can't retain them because the rest of the IT organization still operates like it's 2010. Or they assume the cloud is "easier" than on-premises and reduce headcount, only to discover that cloud operations require different skills but not necessarily fewer people.
Include skills development in the migration plan and the migration budget. If the team isn't ready to operate the cloud environment on day one of production, you have a staffing gap that will express itself as incidents, outages, and runaway costs.
7. The business case that looked better in the proposal
This is the meta-mistake that encompasses all the others. The cloud migration business case is almost always optimistic, because the people building it are motivated to get the project approved.
The comparison methodology is usually flawed. On-premises costs include everything: hardware depreciation, data center lease, power, cooling, maintenance contracts, staff time. Cloud costs include the compute and storage estimate from the pricing calculator and not much else. Network costs are underestimated. Security tooling isn't included. Migration project costs are treated as one-time rather than having an ongoing operational impact. The learning curve for the team is ignored. The cost of running hybrid during the transition (paying for both environments simultaneously) is minimized.
Then there are the "soft" benefits that pad the business case: agility, speed to market, innovation. These are real, but they're not automatic. Moving a monolithic application to a cloud VM doesn't make you agile. It makes you someone running a monolithic application on rented hardware. Agility comes from refactoring, from adopting cloud-native patterns, from changing how the development team works. These are additional investments on top of the migration.
The honest cloud business case acknowledges several things. First, the migration itself will cost more than the initial estimate, because they always do. Second, cloud operating costs will be higher than the calculator suggests for the first 12-18 months, before optimization efforts take effect. Third, there will be unexpected costs that no one modeled. A service that doesn't exist on-premises, a compliance requirement that needs new tooling, a performance problem that requires a larger instance than planned. Fourth, the benefits are real but they accrue over time, not on day one.
A useful exercise: take the cloud business case and add 40% to the cost side. If it still makes sense, the migration is probably justified. If it only works at the original estimate, the business case is fragile and the migration will be seen as a failure even if it's technically successful.
Azure vs AWS vs GCP: what actually matters for enterprise
This is the question everyone asks and nobody answers honestly, because the answer depends on context rather than feature comparison matrices.
AWS has the broadest service catalog and the largest market share. If you need an obscure managed service, AWS probably has it. The ecosystem of third-party tools, trained engineers, and reference architectures is unmatched. The downside is complexity: the sheer number of services, pricing options, and configuration parameters creates cognitive overhead that smaller teams struggle with. AWS pricing is aggressive on compute but punishing on data transfer. Their enterprise support is competent but expensive.
Azure wins the enterprise account by default if the organization is a Microsoft shop, and most enterprises are. Active Directory integration, Office 365 integration, licensing bundles that make Azure credits part of the Enterprise Agreement. Microsoft's go-to-market in enterprise IT is formidable. Azure's networking capabilities are strong, and the hybrid story (Azure Arc, Azure Stack) is the most mature. The downside is that Azure's portal and documentation can be inconsistent, and the pace of change sometimes outstrips the stability that enterprise customers expect. Some services feel like they were shipped to match an AWS feature announcement rather than because they were ready.
GCP is the strongest platform for data and analytics workloads. BigQuery, Vertex AI, the Kubernetes pedigree (GKE is the best managed Kubernetes service by a meaningful margin). if your primary cloud use case is data processing, machine learning, or container-native applications, GCP deserves serious consideration. The downside is a smaller enterprise sales and support organization, fewer regions than AWS or Azure, and a reputation for deprecating services that makes enterprise customers nervous.
For most mid-market and enterprise organizations, the choice comes down to: Azure if you're a Microsoft-heavy environment and want the licensing and integration benefits, AWS if you need the broadest service catalog and the largest talent pool, GCP if data and analytics are the primary drivers. The "multi-cloud to avoid lock-in" answer is usually the most expensive option that satisfies nobody.
How to do it right
None of these mistakes are inevitable. Organizations that migrate to the cloud successfully tend to share some common characteristics.
They start with an honest assessment of their current environment. Not just what's running, but how it's used, what it costs, and what it would take to move. They categorize workloads by migration strategy (rehost, replatform, refactor, retire, retain) based on actual analysis rather than defaulting to lift-and-shift for everything.
They build the business case with realistic numbers, including migration costs, parallel running costs, new tooling, and skills development. They include a contingency that acknowledges the unknowns. They set expectations with the board that cost savings will take 18-24 months to materialize, not six.
They invest in their team before the migration starts, not after. They bring in external expertise for the migration itself but structure the engagement so that knowledge transfers to the internal team. They create cloud operating procedures, cost management processes, and security frameworks before the first production workload moves.
They migrate in waves, starting with lower-risk workloads to build confidence and expertise, then progressing to more complex and critical systems. Each wave incorporates lessons from the previous one. They resist the pressure to compress the timeline, because rushed migrations create the technical debt that makes the cloud more expensive than the data center.
And they measure honestly. Not just "are we in the cloud?" but "are we getting the value we expected?" If costs are higher than planned, they understand why and address it. If the team is struggling, they invest in support rather than hoping the problems resolve themselves.
Cloud migration is the right move for most organizations. But "right move" and "easy move" are different things. The organizations that acknowledge the complexity, invest in the preparation, and manage the transition as a business change rather than a technology project are the ones that end up with lower costs, better capabilities, and a platform that actually delivers on the promise. Everyone else ends up with an expensive lesson in why the spreadsheet didn't match reality.