Should Intelligence Be in the Network or in the Automation System?

Mark Seery

Oct 14, 2020

Spoiler alert: both.

The Important

The location of compute, storage, and network resources is not determined by what people might want the answer to be. Over time, it is determined by economics, performance, security, ease of use, and access to relevant information.
Intelligence will often be centralized and distributed, because different use cases and value propositions drive different needs.

File:Social Network Analysis Visualization.png - Wikimedia Commons

Image Source: Wikimedia

Introduction

When OpenFlow first gained prominence, it set the stage for an industry conversation about network equipment becoming highly programmatic, and not really doing much else except what told to by a centralized controller. This was in large part predicated on the idea that a switch or router was not much more than a forwarding table and a few other necessary evils. While use of controllers is growing, that initial vision of networking that OpenFlow had, is not. If anything, the pendulum is swinging back a little, in the other direction.

As information is relative (delayed in time across space), processing will migrate over time to where it is optimal, from a number of different perspectives, including cost, performance, ease of use, and access to necessary information.

Distributed Intelligence

IP routing has always had a level of intelligence / self-automation, without the need for any external automation system. Capabilities including topology discovery, best path advertisement, peer discovery. At the same time, there has always been a necessary level, of often CLI/script-based, manual configuration, just to get a box/control plane up and running. Automating the manual configuration while leaving the self-automation where it is, might be a good start, and ultimately the right balance.

Can routing protocols self-configure, without the need for an external automation system? IETF work such as RIFT is asking this question. As I have discussed previously, one of the learnings over the last decade is that a router is not like a server. A router participates in a distributed control plane in a way that a server mostly does not, and therefore, whether automation should be at the unit router level, or at the network-level has become an interesting industry discussion.

As I write this article, I am experiencing intermittent Internet issues. Pings and Traceroutes indicate that from the perspective of my laptop, the problems are a combination of DNS and looping packets (many hops into the WAN). Clearly, whatever mechanisms my ISP has in place to be alerted about these problems, diagnose them, and resolve them, are lacking. These problems have been going on for a couple of days.

New Wi-Fi products, like Juniper’s Mist, point networking in new directions. Have a problem that occurs randomly every 30 days or so, but can’t afford to have a tech sitting around onsite with wireshark until it happens again? No problem, Mist detects anomalies, automatically takes packet traces (storing meta data only, not personal data) and diagnoses problems (in cooperation with the cloud engine). This is not taking anyone’s job. This is doing something that would not have been done anyway. This is a qualitative improvement in customer experience and a great example of the kind of intelligence that SHOULD leverage distributed instrumentation, policy enforcement, and intelligence where needed. This is also an example of network equipment being more than just a forwarding table. Distributed radio optimizations using AI/ML may be another example, as is Juniper’s Mist Edge which is a campus solution, that through tunnels, extends SSIDs to remote locations, great for WFH use cases.

In any number of distributed apps and artificial intelligence / machine learning use cases, entrepreneurs/Enterprises are discovering that sometimes there is just too much data to transport back to a central site, especially if that central site is a public cloud charging you for wide area network costs. So both inference engines and training engines are finding their way towards the network edge, as are software applications.

Centralized Intelligence

While approaches like Segment Routing Flex-Algo keep pushing the boundaries of what may be possible from a distributed routing perspective, there are scenarios today where centralized intelligence clearly make a difference.

One example is two different Provider Edge routers (PEs) that a customer is multi-homed to. Ideally, the path from each of those PEs, for a customer, would not intersect, so that when one path goes down, the backup path can be used. Putting aside for now fate sharing at the optical/duct layers, a centralized controller can arguably better understand and orchestrate the needed disjoint paths. I can imagine ways in which PEs could communicate with each other to achieve the same outcome, but that does not exist today, AFAIK. In terms of the optical layer fate sharing, vendors are working on shared risk link group (SRLG) solutions that integrate optical layer information (also information that a distributed router *may* not have).

Another example is end-to-end service provisioning where a workflow activates a service, from the business systems layer, all the way down to the network equipment. A distributed control plane may be invoked during the workflow, but clearly the workflow is managed, overall, by a more centralized system. Customers signaling new service requests to the network over a UNI, don’t seem to have taken off as a norm, even though the industry has from time to time worked on standards in this area. I don’t know the reason why, however, I would speculate there are many legal and commercial reasons.

Lastly, the BIG opportunity for any centralized system is to develop new areas of value by leveraging elastic compute and storage resources to do things that a distributed network/security element does not have the horsepower to do and/or the access to information to do. This can work in conjunction with a controller, or integrated into new cloud-managed service offerings.

Conclusion

We have lived through a period where there was an assumption that all network intelligence would shift to a centralized controller / cloud. To some extent this has played out, including increased use of controllers and cloud-managed offerings. Network & business managers care about ease of use, great experiences, and cost. The rest is up to the industry to discover what is the optionality to achieve that. Anyone who assumes as a defacto position that the answer is distributed or centralized, is limiting their options and ability to innovate.

Internet Dynamics

Discussion about this post