Any New Internet Would Require Purpose and Vision
There are voices singing for a new Internet, a new IP. Neither incrementalism nor a not invented here mindset will lead to anything compelling. If the problem is imperfection, then perhaps an examination of which imperfections are resolvable or not is a good place to start. If the motivation is to be creative and simply see what comes from a clean slate, great, but that is not necessarily a path to attracting investment or followers.
Great change agents, great leaders, have compelling visions, and then they go about removing the hurdles. Musk didn’t sit in his garage for years wondering how he can improve on today’s cars, he set about realizing a vision of a totally different type of car, that had relevance and meaning to the age we live in. Musk had to solve the problem of a nationwide charging network. That had nothing to do with building a great car. That had everything to do with seeing the whole problem. Seeing the whole problem was a phrase that was also associated with Steve Jobs.
Figure 1. Components of Network Outcomes
The big issue that emerges when the whole problem is looked at, is the many dynamic components that impact the reality of networking outcomes. Figure 1 lists some of them, A to H. Even if a great job is done at specifying and implementing the control plane and physical network, there is a great deal more going on that comes down to the capacity, capabilities, and choices made with respect to other elements. A well implemented control plane can collapse if the network design is not done well. An operations function can be overwhelmed depending on the protocol and design choices. As the industry studies the strategic options in networking, digs more into the dynamics of complexity, a picture emerges of an interdependent ecosystem that expands well beyond the network itself.
Applications
In networking, we often take the mindset that it is our job to serve the application layer, and the results of the application layer depend on the network. I’m not sure the applications layer always sees it that way.
For sure, if the network is dropping packets all over the place, or not delivering packets to the right destination, there are going to be consequences for applications.
However, as unlikely as it seems today, there was a time when people thought that a Netflix like service was not possible. The Internet did not have the necessary capability to guarantee performance on a stream by stream basis, therefore, it could not happen. Luckily for all of us, Netflix did not see it that way, and they made everyone see that a little bit of buffering goes a long way. Not all problems can be solved with a little bit of buffering, but that’s not the point.
You never know what cool thing is going to be created at the application layer, and as the COVID experience demonstrated, that might even be the best way to address and issue. As I started to play around with YouTube recently, I discovered that every YouTube link is associated with multiple video/audio streams of different resolutions and bandwidth. I assume they have some algorithm for choosing among them as well.
The applications layer is going to solve some problems better than the network layer, especially a highly regulated network layer.
Devices
Ever since Apple put a computer operating system on a phone, our expectations of what devices can do have sky rocketed. Devices are the other end of the application session, often, and probably not much needs to be said about that, that has not already been said in the application section above, other than to state the obvious, that devices get more powerful every year.
Overlay Networks
Putting the datacenter aside for now, we have to stop and consider the fundamental implication of SD-WAN: one SP no longer can or needs to supply all your connectivity needs, and therefore, a networking approach limited to one SP is becoming less and less relevant. Would a greenfield approach to networking recreate some great way of providing excellent service within the constraints of one SP, when that might not even be a relevant requirement 5,10,15 years from now? There is also the related question of whether a whole lot of QoS in the transport network really matters that much anymore, in terms of application performance, when a SD-WAN can just choose another path if performance metrics are not being met. Ditto for SASE.
Detection, reaction, convergence, path selection, redundant paths, and perhaps call admission control, these are becoming the mainstream approaches to how good network outcomes are being achieved. What would any new network have to offer in these areas?
Objectives and Outcomes
What happens in the network is optimally driven by the mission of the entity, including sometimes business model. That is a dynamic thing. That is a bunch of choices that vary from entity to entity. Is there a one size fits all approach to networking that matches every set of business objectives and outcomes? Which objectives should networks be developed for? The most exacting, the least, a mix? If some parts of the networking family insist that some enormous level of complexity is necessary to meet a service requirement, can the rest of the network be protected from that complexity in some way?
Operations
Standards and network solution decisions have impact on operations. One choice here or there can lead to more or less operations complexity and work. Should networks be developed with the idea that operations units are masters of managing complexity, have incredible capacity, and could, if needed, automate the counting of hanging chads in a Florida election? Or should networks be developed with the idea that may be it is better to put some autonomy/automation within the network itself? Is there one answer to this question? There may be in the future, but there is not today. Operations functions vary greatly in capacity and capability from SMEs through to hyperscalers.
Network Design
Network design has a significant impact on how a control plane operates.
Where is aggregation/summarization done, is it done, how many areas are there, how are the timers set, how much policy definition is done and how is it dispersed,…what is the capacity and capability around planning / modeling before a change is made (big or small) or are rule of thumb heuristics used. For operators that do not have huge design/planning capacity and capabilities, tools in this area may have a much bigger impact on networking outcomes than anything in the network itself. Maybe there is a set of telemetry metrics already on the drawing board to be collected, or could be collected that could feed a network design / planning / simulation / machine learning pipeline.
Capacity
Capacity in anything represented figure 1 has the ability to have a positive or negative impact on overall network outcomes. Generally capacity decreases the need for complex alternatives, but capacity at the application and device layer can also drive load.
There used to be a time when the rate of growth of traffic load was determined by the number of new houses. Not anymore. Now it is determined by the growth of applications and devices, with some constraint provided by how many non-sleeping hours there are in a day; with IoT, perhaps not even that is a constraint.
If routers have really big CPUs and oodles of memory, maybe more design options open up because the routers can now process more state per second. There are tradeoffs with state volume though when things go wrong, where the control plane can be knocked out by positive feedback loops if too many packets are being dropped. If some routers have big CPUs and much memory, but others do not, that can cause problems too, with the smaller routers being overwhelmed by the state they need to process (there are sometimes ways of dealing with this).
As mentioned, operations capacity has a huge impact on how networks are approached, and sometimes even which protocols are used.
Capacity has a cost of course. Capacity is a big lever though, a big lever that may be off the table as far as some operators are concerned.
Environmental Change & Fate Sharing
There was a Latin American country a decade or so ago that was experiencing such a high rate of fiber cuts that the only way to cope with it was to have a highly meshed network at the optical layer. It was one of the few places where there was demand and a legitimate need for that type of network. Environmental conditions in many domains impact network outcomes. Not the least of which is regulation.
An Enterprise can buy one virtual service from one SP and another virtual service from another SP, happily assuming there is redundancy in the network, only to find out that the virtual circuits traverse the same physical path. Is there anything about how networks are developed that can solve this problem?
Control Plane and Network Equipment
I did not list these as dynamic choice driven domains, to make a point, which is there is a great deal outside of how we develop networks that impact network outcomes. However, everyone knows, that these vary from vendor to vendor. Both the feature / design optimality as well as the “quality” of the implementation (broadly speaking), which is not static over time.
Things that cannot be improved on
No approach to networking that we know about today, can transmit information faster than the speed of light. There will always be a delay in detecting an error has occurred.
Networks can detect errors faster by tuning timers and how often error detection packets like BFD are sent out, but the faster problems are detected, the more exposed a network is to getting into a funky state. The industry has learned a little about that by using exponential backoff etc, but still, this is not a simple matter.
In the next ten years, networks will be proactively moving traffic when an AI/ML algorithm predicts a failure is about to occur, and that will help, but it may still leave some topology changes that result in lost and/or looping packets. Because the speed of light has a limit, network designers cannot guarantee that all routers will become aware of a failure at the same time, or will reroute at the same time.
Network convergence/protection times within a single network have come down, though expectations have increased to well below the old 50ms standard of TDM networks. Networks can react to error detection faster with pre-calculated loop free alternatives, assuming the post-error topology is as the LFA algorithm assumes it to be. There are no doubt improvements to be found in the areas of detection and reaction, but it is unlikely the speed of light will be exceeded anytime soon, at any bitrate, let alone a meaningful bitrate.
Things that can be improved on
If there is one open, and compelling question in networking today, it is to what extent networks are designed for automation. This is a question that goes beyond whether they have gRPC/gNMI and NetConf interfaces. When you look at what is happening in Enterprise networks, especially wireless LAN, the comparison is stark, even given these environments have a much more constrained problem . That’s part of the problem though. If you look at what it takes to automate a 5G network with an optical layer, a packet layer, a session layer, a radio layer, hard slicing, soft slicing, multiple vendors, and on it goes…could we have made 5G more complex if we had tried? It will be interesting to see what networks look like by the time 6G rolls around and what lessons have been learned from 5G.
The tradeoffs between operations, design, and the network itself are fair game for innovation, as has been touched on previously in this article. In one sense the hyperscalers have already shown innovation here, and there may be more to come.
Conclusion
Let a thousand flowers bloom for sure. Liberty and creativity have to be among our highest values. However a journey without purpose, direction, and vision, is probably not going to end well; it may also not attract broad investment or following. It has to be more than some hand waiving about new services coming down the pipe. It has to be about more than geopolitics. It starts and ends with an open dialog about what are the (assumed) intractable issues in networking, what are the big pain points, and what is the vision for how the network of tomorrow is leaps and bounds ahead of the network of today. It has to be something that makes the journey worthwhile, and it has to be pragmatic about all the things in figure 1 that impact network outcomes, that have nothing to do with the network itself.