Segment Routing Solution Criteria
Introduction
In assessing segment routing solutions, the intent is to bring focus to The Important. Those things that will move the needle. The below are some issues already bubbling to the surface, and under development, for an upcoming report.
The issues in this article are divided into two categories: routers and controllers. Network operators may wish to deploy segment routing without a controller. For those network operators that see the benefit of a controller, there are additional considerations.
Controller
Segment Routing Controller
The benefit from using a controller, maybe marginal, if the segment routing controller, and the routers, do not support high-speed telemetry for the collection and integration of metrics other than what is already in the routing protocols. Metrics such as CPU, memory, and queuing usage. This should be rated as a key capability. Without this, a controller has no more network state information than the routers.
Global optimizations, disjoint paths, call admission control-like capabilities, analytics and advanced/autonomous trouble shooting are important considerations.
Control Delegation
While there is already some use of controllers, even in IP/MPLS networks, it is a stretch to say that every network manager is comfortable with “relying” on a controller for the control plane. If a controller is important to the value proposition of a solution, then it probably needs to include approaches that comfort network managers that the network will continue to operate, and have reasonable reactions to change, if there is controller failure and/or broken communication to the controller.
One interesting approach is the ability to delegate path computation engine (PCE) responsibilities back and forth between controller and router(s). Example, the controller cannot be contacted, so the router PCE takes over. Controller comes back, it takes back over. As the industry has learned with all neighbor/peer relationships, flapping such as this should probably be dampened with mechanisms such as exponential back off.
Routers
Distributed Control Plane
The distributed control plane, OSPF/IS-IS and BGP4, are the pulse of a segment routed network. Quality, stability, and support are critical. For IPv4-based networks, standard considerations apply, in addition to industry trends towards BGP4 for L2/L3 services. With SRv6 networks (Segment Routing for IPv6) there are some specific considerations.
SRv6 IPv6
Taking the journey to SRv6, generally entails the journey to IPv6, which is a transition in and of itself. Those network managers who have not decided there is a need to move to IPv6, will likely stick with IP/MPLS or SR MPLS over the next few years. Network managers who have decided to move to IPv6, will be interested in deploying SRv6.
A mix of SR MPLS and SRv6 is likely over the next few years, with SR MPLS dominating. SRv6 is viewed today as being less standardized / mature. However, network managers who are committed to moving to IPv6 are diving into SRv6. Deploying SRv6 today requires some special / additional considerations.
SRv6 OSPF
OSPFv3 is required for IPv6. OSPFv3 has some significant differences to OSPFv2. OSPFv3 is viewed as being less mature and less production network tested than OSPFv2. IS-IS more easily evolved to support IPv6 and did not require a new version of the protocol. Vendor discussions of SRv6 typically include IS-IS, and often do not include OSPF. Network managers who prefer OSPF over IS-IS will need to give specific consideration to the maturity of any OSPFv3 implementation in a solution. As OSPFv3 also supports IPv4, so it can be used for both.
SRv6 Compression Header
There is general agreement that it is advantageous to deploy SRv6 with header compression, to reduce protocol overhead. Currently there are some well-known proposals, but not alignment across all suppliers. Depending on preference for single or multivendor networks, a network manager will want to pay close attention to this issue.
To be clear, header compression is not required for SRv6 to work, it is merely considered preferable. SRv6 packets with multiple 16-byte headers can add significant capacity usage compared to minimal packet sizes, and even substantial usage/overhead compared to “average” sized packets. Also an issue for router architectures that copy protocol headers into on-chip memory. There is concern that routers already deployed will be challenged if there are too many segment identifiers in a segment list. The specs of some new routers announced this year look more capable on the SR segment list depth issue, though it is probably worthwhile to check if the quoted depths are not just for SR MPLS, and also apply to SRv6.
SR MPLS has significantly less protocol overhead than SRv6, and is currently more widely deployed, for numerous reasons.
Autonomy
Workflows and other automations are important elements of efficiently and cost-effectively processing repeatable, predictable, and intended actions. However, the other half of the productivity pie is learning from the network, what is happening in the network, and taking both proactive and reactive actions. Networks will soon be rerouting traffic, simply because a route looks like it is going to fail, even if it has not already failed. In Enterprise networks, we are seeing cloud-based services learn, and respond, without the need for the network manager to solve the issue themselves. Autonomy may well be one of the key areas of differentiation in networks over the coming years.
Command Line Interface (CLI)
Python-based CLIs are becoming more common with at least one being open-sourced (Note: 1). While programmatic interface usage is expected to become dominant in the coming decade, a CLI is still requested by network managers. Python-based CLIs, and especially any Yang-model based open source projects, could provide network managers with interesting options going forward, not to mention the longer-term potential of creating a new defacto industry-standards.
Pluggable Optics
If place-in-network appropriate, pluggable 400ZR/ZR+ optics are an important consideration as they may transform access/metro networks, eliminating additional optical systems, and connecting to passive WDM, where desired. Evaluation of pluggable optics options should be part of segment router evaluation. Cisco/Acacia & Nokia develop their own pluggable optics. Huawei may as well. Arista and Juniper are likely to use suppliers like Inphi. How effectively, and broadly, Cisco/Acacia and Ciena market their pluggable optics for use in other routers, is a TBD. While Cisco has made some forward-looking statements about its intent for Acacia optics, many in the industry look for confirmation after the acquisition fully closes and is integrated.
Network Planning
Simulation / emulation has emerged as an important aspect of total life cycle management. The reality is, distributed control planes, as valuable as they are, can be complicated. There are anecdotal public statements attesting to the necessity of simulation in making decisions to go in one direction or another, for new, dense, unfamiliar topologies. For small and midsized networks, this activity may ultimately be consumed by autonomy, but probably not in the short-term. The best solutions will go beyond route planning/configuration generation and include control plane planning and design.
Licensing
Approaches to licensing, both technology and commercials, are a friction point. Some network managers will opt for Enterprise License Agreements, due to their relative simplicity / determinism. Where the ELA construct does not work, then usage reporting and measurement to all stakeholders is the optimal scenario for network managers (it probably should be done in all scenarios anyway). This approach, intelligently implemented, and potentially leveraging telemetry interfaces, can provide other benefits, especially if network operating systems become more mix and match, which is the future pointed to if SONIC truly emerges broadly. Even without SONIC, suppliers and customers benefit from knowing what is actually being used. Forward looking vendors and customers view licensing as more than a necessary evil and look to create an infrastructure that licensing just happens to leverage but is primarily there to support analytics that flows into customer experience, pricing, product management, and other functions.
Conclusion
This article briefly touches on some issues important to Segment Routing solution evaluation. A more detailed report will be published in a few months. Networks are going through significant change. Solution fit is going to be significantly impacted by decisions to develop capabilities in one area or another, as opposed to the minutia of every little dial.
Note 1: November 9th, 2020. The term “open sourced” here refers to a NOS API being used by a python library, and freely available on an open source platform like Git. This is not the same as a community-based effort, covering multiple platforms, based on a model-based paradigm like Yang, which can be speculated as a future state for the industry.
Note 2: December 6th, 2020. See also BGP-LS.