Scalpels, hammers, TE, and QoS
Simplicity and complexity in Network Architecture and Equipment Design
Thanks to fellow antipodean Bruce Davie, some of us who have been in networking longer than Madonna's been having plastic surgery, were able wax sentimental about the good old days and drama surrounding the development of MPLS traffic engineering.
The conversation got me thinking about scalpels and hammers. A scalpel is a small, sharp surgical device. They are often used in analogies to suggest a limited but precise action. Then there is the famous saying, "If all you have is a hammer, everything looks like a nail." If you have one tool or only know how to use one, you will use it everywhere, including where it is far from optimal.
As both manage resources, Traffic Engineering (TE) and Quality of Service (QoS) can be confused in what they do and the arguments for and against them. So first a brief introduction on each.
Traffic Engineering (TE)
In simplest terms, TE places traffic on a path that optimizes for a constraint. Within an administrative/commercial boundary, that constraint is often bandwidth capacity. For example, when every router concludes that the best path for all traffic is the same path, that can lead to capacity hotspots. For the same reason, cost hotspots can occur when a network operator pays to send traffic across the Internet through other network operators.
Quality of Service (QoS)
While TE tries to make the best of link capacity available, QoS is an admission that TE failed. With or without TE, hotspots arise, resulting in packet discards. The question now becomes, which packets? The list of ways the networking community has dealt with this choice is long: RED, WRED, WFQ, Diffserv, HQOS, signaled bandwidth reservations, leaky buckets, token buckets, and I suspect more than I can remember. Some approaches focus on influencing the TCP backoff algorithm, while others focus on a network administrator's judgment about which traffic is most important. To say that QoS has been a controversial subject over many decades would be an understatement. For QoS, Networking has had more first dates than the average person using a dating app in 2023.
TE as a Scalpel
The arguments in favor of TE that occurred this week all focused on using TE as a scalpel. The transmission group is a little slow getting off the dime provisioning more capacity, so reroute some traffic until they do. No one jumped in and set the driver for creating MPLS-TE so an operator could implement an edge-to-edge mesh - the use case that some operators and vendors choked on. Does that mean using MPLE-TE edge-to-edge is a bad idea? Is a full edge-to-edge mesh using TE as a hammer, and is it a bad idea? I will leave that to people who run networks for a living to decide.
QoS gets discarded
In 2020, I wrote "Segment Routing and the Death of QoS". I wrote that article because SR felt to me like a statement that only source-based TE was needed, and QoS was not, or at least it was too difficult to string a series of routers together with a cohesive approach to QoS.
QoS, an approach to deciding how packets get queued/dropped felt like it was itself in the process of being discarded. Oh, the irony, or perhaps, just revenge of the end-to-end nerds. With RED, QoS was kind of like a scalpel, all be it in the hands of a mindless madman, making random decisions (from the point of view of an operator committing to an SLA). Even diffserv could be argued to be a scalpel, compared to HQoS and control plane signaled bandwidth reservations.
Oh, but wait, what if you are a hyperscaler with dense data center topologies and oodles of capacity in every direction you look? Maybe at least for now, QoS is not a high priority.
Conclusion
I am not a network operator, and I am not a vendor engineer. My knowledge of these subjects is probably limited, especially with respect to the nuances and corner cases. However, it does seem that with network architecture, and equipment design, we are always dealing with the simplicity of scalpels vs the complexity of hammers, and exactly where to look first when we are running out of resource capacity.
As always, your mileage may vary.