The Network 2025 Article made the following assertion which this article expands on:
All of IT has been disrupted by hyperscalers and SaaS. Combined, they have both radically changed customer experience(compared to on-prem software models) and operations excellence(compared to traditional IT models). They have also changed business models. It is in the area of operations excellence that the future of networking must first be examined.
Customer Experience and Business Model
Prior to the cloud disruption, on-premise software models, especially from the large Enterprise Vendors, had shifted over a number of decades from one where the cost was primarily in the perpetual license (the right to use a given version of software forever), to one where the total lifetime costs of the software were in the maintenance costs, that covered minor updates, version upgrades, and support. The Microsoft consumer model had remained one where consumers paid for the major version upgrades, and premium levels of support were optional. In the mix were also enterprise license agreements, volume license agreements, and similar arrangements. Putting aside those special agreements, the basic Enterprise model was one where vendors were charging a significant amount for support & maintenance, where the maintenance / upgrades may or may not have been of interest to the customer, and where the customer also had to pay for the on-premise costs: capital equipment, IT staff…
Along came software as a service: no more on-premise capital costs, reduced IT costs, and one charge for the most current software version, frequent incremental updates, and support. Software shifted from being a product to being a service. To be clear, there are nuances here for IT managers who wonder if the forever costs of a subscription service is better than a perpetual license, but bottom line, the software as a subscription model has been disruptive.
In addition to the business model change for Enterprises, cloud-based services of all kind created new customer experience expectation for consumers, and consumers took those expectations to the work place. Cloud services can collect extensive usage data in a non-invasive way (putting aside privacy issues for now), and turn that data into even better service and pricing experiences. Cloud services took care of information management and other software management hassles away from consumers, while also providing availability of the service anywhere there was a web browser or a smartphone application. Cloud based services also provided new forms of application value, leveraging so much centralized and frequently updated data.
Any networking company that is not focused on offering cloud-based services and/or providing the equivalent customer experience of cloud-based services in their on-premise offerings, is living with their head buried in the sand. This is more true for non-embedded software, for example management/operations software, but over time, it is likely true for all software, embedded or not. Yes, Telcos are resistant to their data leaving their network, but…
At the same time as SaaS was disrupting software customer experiences and business models, hyperscalers (amazon, facebook, google & others) were revolutionizing IT operations.
There are a number of tentacles on this revolution:
Microservices-based software architectures
Scale-out hardware architectures
Systems Reliability Engineering (SRE)
Monolithic software to micro services
While micro-services is not necessarily at the center of the point I am making about the revolution in operations excellence for IT, it is part of the overall shift in the market with respect to software architectures, and an example of the kind of change that has been driven in the cloud-era, even if some on-premise software offerings would assert they have a micro-services software architecture (something that more network operating systems – NOS – are today claiming). A good case example of this, is Netflix.
Custom systems to scale-out hardware
Google was at the forefront, around the turn of the century (20 years ago!!), of realizing that a better operations reality could be created by treating hardware as a throw-away commodity. This has been the overall thrust from all large cloud platforms, over the last few decades. Networking has resisted this change, but the change has been coming in increments.
Within hyperscalers there is a range of equipment models from top of rack switches, where some have successfully moved to ODM hardware and open source/independently supplied software, to Data Center Interconnect, customer edge, and WAN which has remained, primarily, the domain of the major router vendors. Part of the dynamic is the continued difference in opinion on the value of large, dense, well integrated chassis-based systems vs leaf/spine architectures. Part of the problem is that networking remains a hard problem, especially if you don’t just throw infinite bandwidth at it (which has limits in spine/leaf and is easier in some parts of the network than others). Also, networking remains complex.
Networking nuances aside, the basic operational model that has disrupted IT is simple hardware elements and/or hardware clusters, highly automated by operations software, leading to high reliability and agility.
Waterfall to Continuous Change
In the time of cloud, change has been taken to the next level. Change is continuous. Something that is hard to replicate with on-premise models, though there are some exceptions, like security threat updates. Continuous development, continuous integration, continuous delivery. No service provider or network solution provider can afford to ignore this fundamental shift in IT. It is true that the DevOps community has over the last couple of years observed that there may have been some over-rotation towards rate of change at the expense of quality, but that aside, the overall trend is clear.
Operations as a people problem to operations as a software problem
In order to realize silicon economics in operations, then operations needs a silicon-based solution. In other words, a software on silicon solution. This is a fundamental economic truth for all businesses, hyperscalers & SaaS giants have made the truth clear for all in IT to see.
There is another, but derivative issue, that Google has put a focus on. Many in networking are taking notice, combining software and systems engineering, balancing operations excellence with ongoing innovation:
SRE is what you get when you treat operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems behind all of Google’s public services — Google Search, Ads, Gmail, Android, YouTube, and App Engine, to name just a few — with an ever-watchful eye on their availability, latency, performance, and capacity.
Systems Reliability Engineering (SRE), builds on DevOps with prescriptions for achieving important operations outcomes. For an introduction to SRE, see this Google Video. Also Google SRE landing page & video on the difference between DevOps & SRE.
A couple of the big implications for networking solutions have been:
APIs for programmability
API-accessible pub/sub state
Streaming telemetry for constant monitoring and analysis
Security: Hackers–>Criminals–>Nation States
Internet security problems have migrated over time from hobbyists/hackers, to organized crime, to nation state attacks – destroy an economy rather than drop bombs, industrial espionage, etc. Nation state and/or terrorist group sponsored attacks have raised the stakes and awareness of security issues.
For network solutions providers and service providers, there are many tentacles to this problem:
Denial of service attacks on an Internet asset and/or the network control plane
Software architecture / design / implementation vulnerabilities
Malware – software running on network equipment and/or network operations software that should not be running.
Hardware supply chain concerns – authentic parts, unauthorized changes during manufacturing, etc.
Not all of these problems are solved problems. For example, how do you ensure that only software that is supposed to be executing on a CPU, or firmware on an ASIC, is executing. There are some chip-level solutions targeted at protecting the memory of one program from another, but overall, chip designers are still thinking through this challenge. Other security challenges, and perhaps all security, is inherently a moving target.
One move in this general problem space has been the various actions taken against Huawei. That is a simplifying move, eliminate a supplier that is suspected of having close ties with the Chinese government. However, most network equipment suppliers manufacture in multiple locations, and the connectivity to/from network equipment and various entities leaves all network equipment with potential vulnerabilities.
Whether this will become a significant decision criteria in the selection of network solutions, is not clear. It is likely though, to get more attention over time.
Cisco, as a solution vendor example, emphasized its Trustworthy Technologies in 2019 announcements, including:
Chain of trust
Trust Anchor Module
Other network solution providers also have various Secure Boot / root of trust / etc. approaches:
This article should not be interpreted as the suggestion one network solutions provider has a better solution than others. That is not being asserted by this article. This article only seeks to illuminate that these are already concerns of IP router suppliers.
The Day Job
None of this changes the day job for an IP Architect or Engineer. Many of the same issues and tunable knobs exist with IP routing: Hierarchical design, addressing plans, summarization, protocol overhead, convergence, redundancy, quality of service routing, quality of service per hop behaviors, and all the state, interdependencies, and business model aligned objectives tradeoffs that go with network design. More on that in a future article.
What are the takeaways for network solution providers and Network operators:
Business model change: on-prem software to software as a service (network as a service – nascent).
Customer experience: continuous incremental change based on instrumenting the customer experience (industry has more to do here).
Microservices-based architectures (a current trend in NOS & automation software)
Scale-out hardware (Spine / Leaf)
Continuous change (pub/sub state, programmability, but networking industry has more to do here)
Operations is a software problem (automation & autonomy)
Network programmability (Netconf, REST, P4,…)
Measure everything, all the time – streaming telemetry (gRPC / gNMI)