It has been a while since I have penned a new post here. I’ve been going through significant change in my life, but most of all, I have spent the last year of my professional life doing a deep dive on AIOps, concluding that it is one of the most significant changes in networking. I now work for an AIOps company, and you can read some of the reasons why, here. Despite where I work, I will continue to bring a thoughtful voice to posts in this blog.
The last year has tied together many themes from my previous writings:
Network Operations is one of the key budget areas, as discussed in Network Architecture: A Three Olive Martini. It’s capabilities and capacity have significant impact on service / experience outcomes and on what needs to be designed into network protocols and architectures.
Though I started my career in IT in network operations, the last year of my life has re-emphasized in my consciousness, how much impact operations have on network outcomes. As an industry, we rightfully spend significant time on protocols and architecture, but operations is incredibly important as well. So much so that I would argue it is time to start talking about the “Operations Plane”.
From my perspective, the significant aspect of the Operations Plane undergoing transformation in this data-rich era, is how inevitable hard and soft errors are detected, mitigated, and remediated. Also included are various sub optimizations. Increasingly, high-performance data collection pipelines, AI/ML, and more generally data science are playing a critical role. There are of course many other significant aspects of operations, including intent-based configuration, but they are not discussed in this post.
A large-scale network, especially dense-topology data centers, can throw off billions of data points per day. No human, no small team of humans, can process that much data, let alone draw systematic and timely insights from it. No humans are doing that today, so machines doing this are not replacing humans. Doing it in a way that reduces the number of alarms and incidents that overloaded and fatigued teams have to explore, is not eliminating operations, because experience and skill is still required. This is an example of Augmented Operations. Allowing operations teams to see the forest for the trees and apply Human operations Intelligence in multiple ways.
As we all know, You cannot eliminate complexity for the SAME capability & certainty, so to get the systematic and timely insights operations teams need, a little complexity has to be added to the operations plane. AI/ML is that complexity, and we are all, as an industry, going through the learning curve of understanding that complexity. Sometimes it is actually much simpler than we imagine, and sometimes it is the result of understanding which algorithms work best for which types of networking issues.
It is an exciting time in the evolution of the Operations Plane, with many interesting developments to come.