NetOps Requires AI/ML & Rules

Precision, learning, and maintainability

Dec 04, 2022

Introduction

The use of rules has hit a brick wall in many contexts. For example, it is modern conventional wisdom that learning-based AI will make faster progress than rules-based AI. While the current, learning from data revolution, that is reshaping all industries and Network Operations should not be underestimated, there remains a critical role for rules, where a compelling signature is well-defined. Importantly though, how rules are managed is key.

AI/ML vs Rules

There are many different types of scenarios for rules and there are many different areas of AI/ML. Examining just a few illuminates some of the strengths and weaknesses of each.

Scenario 1. A Network Operations team knows by experience that a specific log message always indicates a specific problem, and there is always the same response to mitigate/remediate the problem. This is an example of where rules-based automation works well. Well-known signature, well-known problem, well-known response.

Scenario 2. The optimal value varies for each of thousands to millions of objects, for example link latency. Threshold rules provide precision, however they also lead to a high rate of false positives / negatives and/or a heavy load of maintenance. What is needed in this situation is not the ability to recognize a precise value, or even dynamically changing thresholds, but the ability to recognize normal and changing patterns.

Scenario 3. Rare / new log messages often precede future outages. As these messages are rarely seen and mostly unknown, then, by definition, a rule cannot be created for them. Natural Language Processing can be used to flag these types of messages for examination / preventative action. If during the process it is decided these are good anomaly signatures, then a rule can be set to catch future instances.

From the above, AI/ML and Rules have different scenarios where they shine. In addition, there are scenarios where they are complementary to each other and can be used together in a single workflow / process.

Maintaining Rules

One of the upsides of machine learning is Network Operations do not have to maintain rules, the story is in the data. However, as there is also value in rules, the issue then becomes how to maintain them.

Some Network Operations teams are burdened by needing to get rules implemented / changed by a development team in a different function, balancing competing priorities. This is perhaps the worst situation, that can lead to significant delays.

Network Operations teams need the ability to install, modify, and delete rules without requiring software to be restarted / operations interruptions. As a library of signatures expands, then tools must be able to execute a multitude of rules at the rate of streaming operations data.

Conclusion

Learning from data is a revolution, and Network Operations teams can now achieve outcomes they could not have dreamt of only a few short years ago. At the same time, there remains a place for rules, when applied in scenarios where they add value. However, rules must be high-performance, and easy to maintain without interrupting operations.

Internet Dynamics

Discussion about this post