I had my fingers in a few different pies during 2023, which led to insights across several areas.
Premise-based vs Cloud-based security tradeoffs.
A continued feeling that AI/AIOPs will be a capability of Monitoring/Observability platforms.
There may be a place for designing telemetry for AI/AIOPs.
Continued feeling that transformational automation is delayed by the need for tops-down sponsorship.
Are datagram networks coming back?
Network and Security
The tension and tradeoffs between premise-based security and cloud-based security got me thinking about Networking and Security architectures. How can operations teams implement a network architecture without impacting security, and how can security teams implement and security architecture without impacting network performance, economics, and security. The reverberations of cloud are still playing out in these questions.
A premise-based security architecture clearly delivers network benefits in terms of latency, capacity needs, and applicaiton-based routing for performance, economics, and load-balancing. Perhaps even in areas such as jitter, packet loss, and attack surfaces.
Cloud-based security architectures clearly deliver security benefits in terms of agility and consistent coverage of all networking modes (fixed, nomadic, mobile, … ).
How to get the best of both? That is a very interesting question and why the debate still has legs, and why there still needs to be approaches to combining the best of both.
AIOPs as a feature
Earlier in the year I asked the question whether AI is a distinct platform or a feature/capability of a platform.
The rest of 2023 reinforced my view that in the world of IT Monitoring and Observability, it will in the main be a capability within a Monitoring and Observability platform.
Why? The adoption is going to be slow enough for incumbents to catch up. Too narrowly focusing does not solve the entire DevOps/SRE/Cloud Ops/IT Ops/NetOps problem, current economic pressures impact all IT, but especially new approaches that require multi-stakeholder buy-in. Moogsoft was purchased by Dell this year, and the mainstream IT Monitoring and Observability platforms got the memo this year about AI and AIOPs. The bar for a stand-alone platform is going to be high. Not impossible, but high. Few will get over that bar.
Designing Data for AI and AIOPs
Adoption of new approaches to collecting telemetry data has been slower than many expected/hoped. Various explanations are offered: the breadth and depth of models, installed base, etc. All these are abstractly true of most new ways of doing things.
While these approaches will likely become mainstream over time, I wonder if the adoption rate is related to the value proposition: do they offer enough change to outcomes for enough operations/engineering teams?
As I ponder that, another intersecting thought is that most telemetry was designed before the age of AI and AIOPs (arguably, that age has yet to even start in earnest).
I have heard senior engineering types I respect say that if the networking industry, for example, were serious about automation, many standard protocols would be redesigned. A future reality that seems unlikely.
I have also had enough experience, with enough different monitoring and observability platforms, to know that creating enough tagging to make visual UI filtering and AI-based correlation attractive often requires significant setup and perhaps maintenance. Some automated tagging can be done for isolated environments, for example, a single public cloud or even multiple public clouds for a subset of tags and constructs. However, enough standard tagging across configuration, alerts, metrics, events, logs, flows, and more is significantly more challenging.
Several technological approaches are jumping into the fray to address this. Natural language processing of logs is one example. Another is ML clustering to discover what events may be associated. These approaches may end up competing in the market against approaches that constrain environmental choices to achieve more consistency, for example single-vendor vs multi-vendor. Ingesting meta-data also plays a role in this space if there is enough of the needed type of meta-data available, and digestible. In a world of on-prem, cloud, multi-cloud, and multi[ple vendors, how this plays out is not clear. NLP, ML will be pushed on because that is easier than chaging the way everything works, but this is an interesting area to watch.
Automation
It was fun to see the Network Automation Forum kicked off this year. Hopefully, it will keep growing and evolving.
No doubt, automation has technical challenges, even if only socializing best practices. OTOH, transformational automation does not usually occur unless pushed down firmly from the top. The need for committed top-down sponsorship may always be case when technology changes intersect with process/workflows and organizational dynamics.
There are long-term economic benefits from automation, I am sure, due to seeing them play out. I am uncertain about what proportion of IT/business leaders have this at the top of their to-do list, and are willing to stay with it long enough to get the results, not to mention take on the internal organizational challenges. I want to think 2024 will bring more clarity here, but I am not optimistic. Organizations may decide to wait until AI/ML/AIOPs matures before taking this on.
SNMP vs Cloud Managed APIs
Speaking of telemetry, with the growing importance of cloud-managed devices to monitoring and observability, it is worth taking a look at the support these devices have for SNMP, or lack of support as the case may be. Certainly there are multiple reports of patchy support, a focus on vendor APIs, and perhaps even some preferential treatment of API usage to vendor supplied management approaches. There are no doubt good intent reasons for difference in API call rates, as an example.
Nonetheless, the question may well be asked in 2024 and beyond, what does cloud-managed device monitoring mean for the direction of telemetry, monitoring, and observability across the entire network and IT landscape.
The Rebirth of Datagrams
There was once imagined a network where packets could be randomly sprayed in all directions to achieve ultimate resiliency and capacity utilization. It did not play out that way.
It turned out that networks evolved so that packets were sent and received in sequence and often along selected preferred paths, regardless of how much-unused capacity there was. Various approaches to breaking from this have been developed, such as ECMP, LAG, and path/forward equivalency class coloring. None of these meet the needs of those building I/O-bound AI/ML compute/storage clusters. So, back to the drawing board.
2023 was the year that most networking professionals became aware of industry designs for randomly spraying packets across an AI/ML fabric and then ordering them somewhere close to the server, for example, an access switch or a smart NIC.
In the big scheme of things, this represents a big buzz about a niche networking use case, even if it is an important use case. The fascinating question is whether this approach to networking will expand beyond this use case. I can't wait to see what 2024, 2025 and beyond might bring in this respect. I have seen similar approaches also considered for internal equipment fabric designs and between Ethernet Switches. In both cases, it was a couple of decades ago. Has the time come for a style of datagram network to go mainstream?
Conclusion
In 2023, if you say to people the cloud is disrupting IT, most will roll their eyes. We’ve been talking about cloud for so long, it is an accepted part of the landscape, and most have made some adjustments to it. In addition, there are a substantial number of born in the cloud companies. Yet, all around IT, optimizing for the disruption of cloud is ongoing. What is the best way to combine Networking and Security? Which work loads should be on the cloud and which should be repatriated? What’s happening with cloud managed devices? How to best manage across on-prem and cloud? Best approaches to cloud migration? And the list of questions go on.
And then there is AI. Not much needs to be said about that. 2023 was unquesitonably the year when AI/ML in many forms hit mainstream consciousness, across the entire culture, not just within IT. We will be discussing the ins and outs of it for many years to come.
I hope everyone has a great holiday period. I am deeply grateful I do not live in a part of the world that is torn by war, I wish everyone, on all sides of conflict, a joyful, productive, and peaceful new year.