SD-WAN was supposed to be the answer to everything. Replace your expensive MPLS circuits with cheap broadband, add an intelligent overlay, and watch costs plummet while performance improves. The pitch was compelling. Gartner put it in the Magic Quadrant. Fortinet bought a company. VMware built one. Cisco acquired Viptela and then merged it into their Catalyst family. Every networking vendor either built, bought, or partnered to get SD-WAN into their portfolio. The industry consensus was clear: MPLS was dead, SD-WAN was the future.
Having designed and deployed SD-WAN architectures across dozens of multi-site enterprises on six continents, we can say this with confidence: SD-WAN is genuinely useful. It solved real problems. The ability to use multiple WAN links simultaneously, steer traffic based on application requirements, and centralize policy management across hundreds of sites was a genuine step forward from the static, expensive, inflexible MPLS architectures that preceded it. But the industry sold it as a complete networking strategy, and it isn't one. Here's what gets missed.
The underlay still matters
SD-WAN is an overlay technology. It makes intelligent decisions about how to route traffic across your WAN links. But it doesn't create bandwidth, it doesn't fix congestion, and it doesn't improve the underlying transport. An SD-WAN overlay running across two bad internet circuits gives you an intelligent way to choose between two bad options. It doesn't make either option good.
If your two broadband circuits both traverse the same last-mile infrastructure (and they usually do, because most business parks and office buildings are served by a single duct route from the street) then your "redundant" WAN paths have a shared single point of failure. SD-WAN will happily fail over between them, but if the fiber in the street gets cut by a contractor with a backhoe, both links go down together. The SD-WAN dashboard will show two red links instead of one, which is more information than you had before, but the site is still offline.
This is not a theoretical concern. We've seen it repeatedly. Organizations that replaced a single MPLS circuit with dual broadband and SD-WAN, believing they'd improved resilience, when they'd actually made it worse. At least the MPLS came with an SLA, a managed last mile, and a carrier whose revenue depended on keeping the link up. Two broadband circuits from different ISPs that share the same Openreach duct have the appearance of diversity without the substance.
True underlay diversity means physically different paths: fiber from one provider via one duct route, and a second connection that uses a completely different physical path. Maybe a different fiber provider entering the building from the opposite side, or a 4G/5G cellular backup that doesn't depend on the physical infrastructure at all. Achieving this requires understanding the physical network topology in the street, which most SD-WAN vendors never discuss because it's not their problem. But it's very much your problem.
The first question we ask in any SD-WAN assessment isn't about the overlay. It's about the underlay. Show us your circuit diversity. Show us your last-mile paths. Show us the duct routes. That's where resilience lives or dies.
The carrier dependency nobody discusses
SD-WAN promised to free organizations from carrier lock-in. Use any ISP, any circuit type, any combination of links. That's true at the overlay level. But the underlay (the actual circuits) still comes from carriers, and those carrier relationships still matter.
When you had a single MPLS provider, you had one throat to choke. One escalation path. One SLA. One commercial relationship. When you replace that with dual broadband from two ISPs, a 4G backup from a mobile carrier, and an SD-WAN overlay from a fourth vendor, you now have four separate relationships, four separate SLAs (or no SLAs, in the case of consumer-grade broadband), and four separate fault reporting processes. When the site goes down, which vendor do you call first? How do you coordinate troubleshooting across multiple providers who have no relationship with each other and no visibility into each other's infrastructure?
The operational complexity of multi-carrier SD-WAN is routinely underestimated. Each carrier has its own provisioning timeline (anywhere from days to months), its own contract terms, its own support hours, and its own definition of "business broadband." Managing 50 sites with two circuits each from different providers means managing 100 carrier relationships. Without a dedicated team or a managed service wrapper, that operational burden overwhelms the cost savings that justified the SD-WAN deployment in the first place.
Managed SD-WAN services from carriers like BT, Colt, or AT&T attempt to address this by bundling the overlay and the underlay into a single managed service. This simplifies operations but reintroduces a form of the carrier dependency that SD-WAN was supposed to eliminate. The trade-off is real: do you want operational simplicity with vendor dependency, or vendor independence with operational complexity? Most organizations choose one without explicitly acknowledging they've rejected the other.
The visibility gap
Every SD-WAN vendor ships a dashboard. Most of them look impressive. Color-coded tunnel health, real-time throughput graphs, application flow charts. The operations team sees green across the board and assumes everything is fine. Then a site goes down and nobody can explain why.
The problem is fundamental: SD-WAN dashboards monitor the overlay. They tell you whether the tunnels are up, how much traffic is flowing through each, and which applications are using which path. What they don't tell you is what's happening in the underlay. The actual transport that carries your encrypted tunnels.
When a broadband circuit degrades, the SD-WAN detects increased latency or packet loss on the tunnel and either compensates with FEC (forward error correction) or shifts traffic to the other link. The dashboard shows the failover happened. It does not show why. Was it a fiber cut? A congested DSLAM? An ISP routing change that added four hops to your path? A DNS resolution failure that caused application timeouts that look like network issues? The SD-WAN doesn't know and can't tell you.
This visibility gap matters because troubleshooting becomes guesswork. Your NOC sees tunnel degradation and opens a ticket with the ISP. The ISP runs a line test, says the circuit is fine, and closes the ticket. The real issue was an upstream peering dispute that added 80ms of latency to traffic destined for your cloud provider. Neither the SD-WAN dashboard nor the ISP's line test revealed the root cause because neither was looking at the right layer.
Organizations that take this seriously deploy independent monitoring that sits below the SD-WAN overlay. Tools like ThousandEyes, Kentik, or even simple ICMP and traceroute probes running from endpoint agents give you underlay visibility that the SD-WAN vendor's dashboard never will. The cost is modest. The operational value during a major incident is enormous. Without it, you're flying the overlay blind to the terrain below.
Security is bolted on, not built in
Early SD-WAN solutions had minimal security. Traffic inspection, threat prevention, and access control were afterthoughts. Understandably, because the original problem SD-WAN solved was WAN optimization, not security. Traffic left the branch, crossed the overlay, and arrived at the data center. Security was handled at the data center, same as before.
But SD-WAN's other big selling point (direct internet access from the branch, bypassing the data center backhaul) created a security gap. If traffic goes directly from the branch to the internet, it bypasses the centralized security stack. Every branch with DIA (Direct Internet Access) is now its own internet edge, and each one needs its own security controls.
The response was SASE (Secure Access Service Edge) which bundles networking and security into a single cloud-delivered service. Route traffic from the branch through a cloud-hosted security stack that provides firewall, secure web gateway, CASB (Cloud Access Security Broker), and zero-trust network access. The concept is sound. The execution varies enormously between vendors.
Zscaler has a genuine cloud-native security platform but limited SD-WAN capabilities. Palo Alto Networks bought CloudGenix for SD-WAN and has Prisma Access for SASE, but integrating the two is an exercise in patience. Fortinet's approach bundles security and networking in a single appliance (FortiGate), which works well but means your branch security is only as good as the local appliance. And managing security policies across hundreds of FortiGates is a significant operational effort. Cisco's Catalyst SD-WAN (formerly Viptela) integrates with Umbrella for cloud security, but the integration points have historically been clunky. VMware's SD-WAN by VeloCloud was acquired by Broadcom, and the future of the product line is uncertain at best.
Key questions that don't get asked often enough during SASE evaluations:
- Where are the inspection points physically located? Latency to the nearest PoP matters for real-time applications. If the nearest SASE PoP is 200ms away, your voice and video traffic will suffer.
- What happens when the SASE cloud has an outage? Do your sites lose all connectivity, or is there a local breakout path? Zscaler has had significant outages that took down internet access for thousands of organizations simultaneously. Your DR plan needs to account for SASE platform failure.
- Can you inspect encrypted traffic without breaking certificate chains for your compliance-sensitive applications? TLS inspection is technically feasible but creates complications for certificate-pinned applications, certain banking applications, and healthcare systems.
- How does the vendor handle data sovereignty? Where does your traffic actually flow? Does inspection happen in-country, or does your UK traffic route through a US PoP for inspection?
- What's the vendor's track record on keeping their threat intelligence current? A cloud security service is only as good as its detection capability, and that capability degrades rapidly if the vendor isn't investing continuously in threat research.
SSE: when you don't need the WAN part
Security Service Edge (SSE) is SASE without the networking. It provides the cloud-delivered security stack (SWG, CASB, ZTNA, DLP) without the SD-WAN overlay. This is relevant because many organizations already have an SD-WAN deployment and don't want to rip it out to adopt a full SASE stack from a different vendor.
SSE makes sense when your SD-WAN is working well and your gap is specifically in security. You need better control over cloud application access, better visibility into shadow IT SaaS usage, or zero-trust network access for remote workers. Deploying Zscaler ZIA/ZPA alongside an existing Fortinet or Cisco SD-WAN is a common and legitimate architecture. The SD-WAN handles path selection and WAN optimization. The SSE platform handles security inspection and access control.
The challenge is integration. Two control planes means two policy sets, two management consoles, two vendor relationships, and the possibility that a change in one system creates an unintended consequence in the other. Routing traffic from the SD-WAN overlay through the SSE inspection cloud adds latency and introduces a dependency on the SSE platform's availability. These are manageable problems, but they need to be acknowledged and designed for, not discovered in production.
DNS and DHCP: the infrastructure nobody redesigns
Here is something that derails SD-WAN deployments with surprising regularity: nobody remembers to redesign DNS and DHCP.
In a traditional MPLS network, DNS resolution for branch sites typically points back to central DNS servers at the data center. DHCP is either served locally from the branch router or relayed to a central DHCP server. Both work fine when all traffic hairpins through the data center anyway. But SD-WAN changes the traffic flow. You're now doing local internet breakout at the branch. You're sending SaaS traffic directly to the cloud. And your DNS is still resolving through the data center, which means the cloud provider's anycast routing sees the DNS query coming from your data center's location, not the branch's location, and directs the user to a cloud PoP that's optimal for the data center. Not for the branch.
The result is measurable. A branch in Manchester doing local breakout to Microsoft 365 should resolve to Microsoft's London or Manchester PoPs. If DNS is still resolving through a data center in Frankfurt, Microsoft's traffic manager sends that user to a European PoP optimized for Frankfurt. The user gets 40ms of unnecessary latency on every single request. Multiply that across the hundreds of micro-transactions a modern SaaS application makes per session, and you've just negated a meaningful chunk of the performance benefit that local breakout was supposed to deliver.
The fix is straightforward but requires deliberate planning: deploy local DNS resolution at branch sites, or use the SD-WAN vendor's built-in DNS proxy (most have one, few deploy it). Configure conditional forwarding so that internal domains still resolve against your corporate DNS infrastructure while public domains resolve locally. Test that GeoDNS and anycast routing from major cloud providers actually return the correct endpoints for each branch location. This is unglamorous work. It's also the difference between an SD-WAN deployment that delivers on its performance promises and one that underwhelms for reasons nobody can pinpoint because they're looking at tunnel metrics instead of DNS behavior.
DHCP presents a different challenge. When you move from a centralized MPLS topology to SD-WAN with local breakout, you need DHCP to serve local DNS servers, local default gateways, and potentially different DHCP options for different VLANs at each site. Organizations that relied on a centralized DHCP model now need site-specific scopes, and those scopes need to be managed at scale across potentially hundreds of branches. The SD-WAN platform can usually handle this, but it needs to be configured per site, which means someone needs to audit every branch's DHCP requirements during the migration. In practice, this step gets skipped because it's tedious, and the result is six months of help desk tickets about "slow internet" at branches that are actually suffering from misconfigured DNS.
Application performance needs more than path selection
SD-WAN vendors demonstrate impressive application-aware routing. Detect that Teams is performing poorly on link A, shift it to link B. Detect that the primary path to AWS has higher latency than the secondary, and switch. It works well in demos and it works well for some use cases.
But for organizations with genuinely performance-sensitive applications (live broadcast, real-time trading, industrial control systems, telemedicine) path selection isn't enough. You need QoS that extends beyond your own network edge. SD-WAN gives you control over the first and last mile. The middle (the internet) is still best-effort. No amount of intelligent path selection at your branch can fix congestion on a transit link in Frankfurt that your traffic happens to traverse.
For many organizations, this is fine. Microsoft 365 and general web browsing are tolerant of variable internet performance. But for organizations where application performance directly affects operations (where a 50ms increase in latency means a failed trade, a dropped broadcast feed, or a surgeon who can't see the video feed clearly) relying on best-effort internet transport with SD-WAN optimization on top is insufficient. These organizations need managed network services with end-to-end SLAs, or private connectivity to cloud providers (AWS Direct Connect, Azure ExpressRoute, Google Cloud Interconnect), or both.
Knowing which camp you're in before you architect the solution is the difference between a successful deployment and an expensive lesson. The SD-WAN vendor's presales team will demonstrate their solution working beautifully over broadband. What they won't demonstrate is what happens when the broadband degrades during peak hours, when the ISP's upstream is congested, or when an international routing change adds 80ms of latency to your critical application path.
WAN optimization vs SD-WAN: the persistent confusion
This conflation has cost organizations more wasted deployment cycles than almost any other misunderstanding in enterprise networking. SD-WAN vendors include "WAN optimization" as a checkbox feature, and procurement teams assume this means they can retire their Riverbed SteelHead or Silver Peak (now Aruba EdgeConnect) WAN optimization appliances. Often, they cannot.
Traditional WAN optimization does several things that SD-WAN's built-in acceleration does not. Byte-level deduplication is the big one. A Riverbed SteelHead maintains a data store of previously seen byte patterns and replaces repeated segments with references. When an engineer opens the same 50MB CAD file that a colleague opened yesterday, the WAN optimizer recognizes the byte patterns and transmits only the delta. Often reducing the transfer to a few hundred kilobytes. SD-WAN has no equivalent to this. It doesn't cache data. It doesn't deduplicate. It can't recognize that the same content has been sent before.
Protocol-specific optimization is the other gap. WAN optimizers include application-layer proxies for protocols like CIFS/SMB, MAPI, and HTTP that understand the protocol's behavior and reduce the number of round trips required to complete operations. An SMB file browse over a 100ms WAN link might require dozens of round trips, each adding latency. The WAN optimizer intercepts the SMB conversation, completes it locally on behalf of the remote client, and synchronizes the result. Reducing what would be a 15-second directory listing to a sub-second operation. SD-WAN's TCP optimization manipulates window sizes and acknowledgment behavior, which helps with bulk throughput, but it doesn't understand the application protocol and can't eliminate the round trips.
The practical question is whether your application mix still benefits from traditional WAN optimization. If your users have moved entirely to SaaS applications that use their own optimized protocols and CDNs, the answer may be no. Those applications already minimize round trips and the WAN optimizer's proxy actually interferes with the application's own optimization. But if you still have file servers, legacy applications with chatty protocols, or significant SMB traffic between sites, removing the WAN optimizer during the SD-WAN migration is going to produce a noticeable performance regression that the SD-WAN's TCP acceleration can't compensate for. The only way to know is to audit your actual traffic before you decommission anything.
Zero Trust and SD-WAN: different problems, same slide deck
Somewhere around 2022, SD-WAN vendors started putting "Zero Trust" on their feature lists. ZTNA (Zero Trust Network Access) appeared as a toggle in management consoles. Marketing slides showed SD-WAN and Zero Trust converging into a single platform. This is misleading, and the confusion has real security consequences.
SD-WAN is a network connectivity technology. It builds encrypted tunnels between sites and makes routing decisions about traffic flows. Zero Trust is a security architecture philosophy that says no user, device, or application should be trusted by default, regardless of their network location. These are fundamentally different concerns. SD-WAN answers the question "how does traffic get from A to B?" Zero Trust answers the question "should this specific user on this specific device be allowed to access this specific application right now?"
What most SD-WAN vendors call "ZTNA" is typically a cloud-hosted reverse proxy that brokers access to internal applications without requiring a traditional VPN connection. This is useful. It eliminates the need for full tunnel VPN for remote users accessing specific applications. But it's a narrow implementation of one aspect of Zero Trust, not Zero Trust itself.
A genuine Zero Trust architecture involves continuous identity verification, device posture assessment, micro-segmentation at the application layer, least-privilege access policies that adapt based on context (location, time, device health, behavior anomalies), and comprehensive logging of every access decision. This requires integration between your identity provider (Entra ID, Okta), your endpoint management platform (Intune, Jamf), your SIEM, and your application layer. The SD-WAN provides the network transport underneath all of this. It is infrastructure for Zero Trust, not a substitute for it.
The danger of conflating SD-WAN's ZTNA feature with an actual Zero Trust strategy is that organizations check the box and stop. They enable the vendor's ZTNA toggle, declare themselves "Zero Trust," and skip the harder work of implementing proper identity governance, device compliance policies, and application-level access controls. The network is segmented but the access decisions are still binary. You're either on the network or you're not. That's the old model with a new coat of paint.
The integration testing problem
SD-WAN vendors test their platforms against common applications: Microsoft 365, Salesforce, SAP, Zoom, Teams. They publish compatibility matrices and optimization profiles for these well-known workloads. What they don't test is your bespoke inventory management system that was written in 2009 and communicates using a custom binary protocol over non-standard ports. They don't test the SCADA polling system that expects sub-10ms response times and interprets any timeout as a device failure. They don't test the mainframe terminal emulator that uses TN3270 and breaks if any network device modifies the TCP window size.
Integration testing failures in SD-WAN deployments tend to fall into predictable categories. The first is applications that use hardcoded IP addresses instead of DNS names. When the SD-WAN changes the traffic path, the application's assumption about which IP it should reach breaks. The SD-WAN vendor's application identification engine classifies the traffic as "unknown" and routes it via the default policy, which may not be the right path for that application's latency or bandwidth requirements.
The second category is applications that maintain persistent TCP connections and react poorly to path changes. When the SD-WAN shifts a flow between links, some implementations will reset the TCP session. Applications that maintain long-lived connections (database replication, message queues, persistent WebSocket connections) may not reconnect gracefully. The application logs show a disconnection and reconnection, but if the reconnection logic has a backoff timer, you get periodic data gaps that correlate with SD-WAN path changes but look like application bugs to the development team.
The third is applications sensitive to MTU changes. SD-WAN tunnels add encapsulation overhead, which reduces the effective MTU on the path. If an application sends packets at exactly 1500 bytes and the SD-WAN tunnel overhead reduces the path MTU to 1400 bytes, those packets need to be fragmented. Most modern TCP implementations handle this via path MTU discovery, but if ICMP "fragmentation needed" messages are blocked somewhere in the path (which happens more often than it should) you get silent packet drops and application timeouts that are nearly impossible to diagnose without packet captures.
The solution is not complicated, but it is time-consuming: build a comprehensive application inventory before the SD-WAN deployment, classify each application by protocol, port, latency sensitivity, and session behavior, and test every single one during the pilot phase. The applications that break are never the ones you expected.
When SD-WAN is the wrong starting point entirely
There are scenarios where SD-WAN isn't just insufficient. It's the wrong starting point for the design.
Sites with a single WAN link. SD-WAN's core value proposition is intelligent path selection across multiple links. If a site has a single internet circuit and no prospect of getting a second (rural locations, some international sites, certain building types), SD-WAN provides application visibility and centralized management but limited operational benefit. You're paying for overlay intelligence that has nothing to optimize across.
Environments dominated by real-time traffic. If the majority of your WAN traffic is voice, video, or industrial control, the SD-WAN's ability to steer between paths is less valuable than having a single, high-quality, SLA-backed path. Path selection helps when you have a mix of tolerant and intolerant traffic. When all your traffic is intolerant, what you need is a guaranteed-quality transport, not a choice between two best-effort options.
Small deployments where the management overhead exceeds the benefit. An SD-WAN deployment requires a controller (usually cloud-hosted), edge appliances at each site, and someone who knows how to configure and maintain the overlay. For an organization with five sites, the cost and complexity of an SD-WAN platform may exceed the benefit compared to a well-configured set of routers with a VPN overlay. SD-WAN's economics and operational benefits scale with the number of sites. Below a certain threshold, they don't justify the investment.
Organizations that haven't fixed their application architecture first. If your branch offices are still backhauling all traffic to a central data center for processing, SD-WAN optimizes a broken architecture. The answer isn't to make the backhaul more intelligent. It's to move to a cloud-first application model where applications are consumed from the nearest point of presence, not routed across the WAN to an on-premises server. SD-WAN can facilitate this transition, but deploying it without first addressing the application architecture means you're optimizing the wrong thing.
The vendor lock-in nobody talks about
One of SD-WAN's selling points was escaping MPLS vendor lock-in. The irony is that most organizations simply traded one form of lock-in for another.
Your SD-WAN overlay, your security policies, your traffic engineering rules, your application classifications, your site templates, your failover logic. All of these live in your SD-WAN vendor's proprietary control plane. Moving to a different SD-WAN vendor means rebuilding everything from scratch. There is no migration path from Fortinet SD-WAN to Cisco Catalyst SD-WAN. There is no export format for VMware VeloCloud policies that Palo Alto Prisma SD-WAN can import. Each vendor's configuration model is proprietary, and your investment in configuring, tuning, and optimizing the overlay is non-transferable.
This lock-in became uncomfortably real for VMware VeloCloud customers when Broadcom acquired VMware and began restructuring the product portfolio. Organizations that had deployed VeloCloud at hundreds of sites suddenly faced uncertainty about the platform's future, with no practical way to migrate quickly. The switching cost for SD-WAN isn't just the new hardware and licensing. It's the engineering effort to redesign, reconfigure, and revalidate the entire overlay. A project that typically takes six to twelve months for a large deployment.
This isn't a reason not to use SD-WAN. It's a reason to choose your vendor carefully, negotiate your contracts wisely (including exit terms and data portability), and maintain documentation that exists independently of the platform. We always recommend keeping a vendor-neutral network design document that describes the intent (what each site needs in terms of connectivity, redundancy, and performance) not the implementation. When you need to move (and eventually you will) the design document is what you rebuild from.
What "enough" actually looks like
SD-WAN is a component of a network strategy, not the strategy itself. A complete enterprise network architecture includes:
- Underlay diversity. physically diverse WAN paths from different providers using different last-mile infrastructure. Verify the physical diversity; don't assume it based on having different ISP names on the invoices.
- Underlay visibility. monitoring that sees below the overlay, showing you ISP performance, routing changes, and last-mile health independently of the SD-WAN's tunnel metrics.
- Integrated security. whether through a SASE platform, an SSE layer on top of SD-WAN, or local security appliances. Not bolted-on, not afterthought, not "we'll add SASE later," and not a ZTNA checkbox masquerading as Zero Trust.
- DNS and DHCP architecture. redesigned for the new traffic flow, not carried over from the MPLS topology. Local DNS resolution at branches, conditional forwarding, and DHCP scopes that reflect the SD-WAN's traffic patterns.
- Application-layer optimization. WAN optimization where the application mix demands it, retired where SaaS has made it irrelevant. Audit your traffic before you decommission the SteelHeads.
- Performance monitoring. end-to-end, not just edge-to-edge. Synthetic monitoring that measures actual application performance from the user's perspective, including the internet transit path and the cloud provider's network.
- Local survivability. sites that continue to function when the WAN or the cloud goes down. Local authentication caching, local DNS resolution, and local breakout for critical cloud applications.
- Integration testing. every application in your inventory validated against the new transport, not just the five the vendor tested in the lab.
- Cloud connectivity strategy. direct peering or private connectivity to major cloud providers for performance-sensitive workloads. AWS Direct Connect, Azure ExpressRoute, Google Cloud Interconnect.
- Vendor independence. design documentation, configuration backups, and contractual protections that let you move when you need to.
SD-WAN solved the branch office connectivity problem. It deserves credit for that. It made multi-site networking more flexible, more visible, and in many cases less expensive. But if you're treating it as your complete network strategy, you're building on an incomplete foundation. The network didn't stop at the branch. Your architecture shouldn't either.
The organizations that get enterprise networking right treat SD-WAN as one layer in a designed architecture. Not as the architecture itself. They pair it with genuine underlay diversity, integrated security, end-to-end performance monitoring, and a clear understanding of where SD-WAN adds value and where it doesn't. That's a network strategy. Everything else is a vendor deployment.