Evolution of Useful Results from Anomaly Detection Systems

Hello Azure security community! Today’s blog post on anomaly detection systems is brought to you by Ram Shankar Siva Kumar from Azure Security Data Science, in collaboration with Andrew Wicker, Dan Mace and Kaidi Zhao.

The CISO of one of the premier National Labs in the country, said he is going to level with us: The lab invested millions of dollars on a bespoke security anomaly detection system that was built ground up, by their cream of the crop data scientists and sadly, the system did not yield any useful alerts. The false positive rates were just too high and for all intents and purposes, the security analyst team was being sent on a goose chase. The CISO and his team wanted to know how Azure Security Data Science, my team, would handle the alert deluge and what can we tell them from our experience, to help them whittle down the false positive rate, so that they can focus on catching attacks.

As a Data Cowboy in Microsoft’s cloud Security data science team, the stories I hear on anomaly detection systems from customers follow a particular pattern: an organization invests in SIEM, and then hires data scientists to build advanced detections from the gathered data only to find that the team of security analysts are unhappy with the results. There is more than disgruntled analysts at stake: a recent study by the Ponemon Institute[1], showed that organizations spend, on average, nearly 21,000 hours each year analyzing security false positives, wasting roughly $1.3 million per year due to inaccurate or erroneous intelligence. The immediate reaction is to invest in a newer, more complex algorithm that can reduce the false positive rate and surface better anomalies which threat analysts might not agree with.

This blog post has three takeaways:

  1. The end goal of security anomaly detection systems should not be to produce anomalies but actionable security alerts, that are useful i.e. interpretable, credible and elicit downstream action
  2. Increasing the complexity of the algorithm without actually instilling security domain knowledge has little effect on false positive rates
  3. A framework to show how to go from noisy outliers to the end goal of security, so that you can make an honest assessment of your anomaly detection system to see where you fit in.

Before we build an anomaly detection system, we enforce the following constraints to consider the result to be “useful”

  1. Interpretable: Every security risk we alert or report on must be explainable. For instance, it is not sufficient to say a process running on the host is anomalous. Instead, the detection system should alert that it has detected PowerShell.exe running in the App folder.
  2. Credible: The analyst must trust in the result, when he receives additional contextual information the alert. Continuing the anomalous process creation example, the IT admin may receive the command line arguments that were passed when PowerShell was invoked.
  3. Actionable: When results are interpretable and credible, they are more likely to lead to downstream action. Depending on the severity of the incident, this could be anything from rolling the credentials to getting forensics artifacts from the host machine.


Now using “Usefulness” as a dimension of comparison, our experience with anomaly detection systems can be summarized as follows:

  • Blindly increasing the complexity of the system, without much security domain knowledge, does not increase the utility of the end results
  • The biggest game changer happens, when we can get constant feedback from the security experts which in turn helps the data scientists to refine their behavioral detection at each step.

The following table shows how based on the complexity of the behavioral anomaly detection system and the amount of security knowledge that is instilled, we can have three different types of security alerts i.e. outliers, anomalies and “security interesting”. We chose to illustrate the table with a case of detecting suspicious activity based on login failure records alone. In practice, this could be done by monitoring the 4625 event id in the Windows Security Event Logs.

Type Description When to use? Persona Involved Example Drawback

I: Outliers












Basic anomaly detection methods like standard deviation, and no domain information Suitable for areas where domain knowledge is sparse and security risks are uncertain. There is no security person involved, at this point in time

ML knowledge is informal at best.

E.g: Alert when the number of failed logins deviates from three times the baseline. Not all deviations from normal behavior are anomalies! Because of limited security domain knowledge, the system, has high False positive rate.
II: Anomalies


Increased complexity of the anomaly detection along with limited domain information. Domain information is limited to rules or filtering of the initial data. There is minimum interaction with Security analyst

ML knowledge is more holistic and complete.

Domain Expert: “If there is an unusually high failed logins during night time, that activity is more anomalous”

Anomaly Detection System: Use Holt Winters to detect seasonality of logins, and alert when patterns don’t conform to trends.

Not all anomalies are security interesting!

For e.g: There may be a high number of failed logins owing to an expired credential. While this is a legitimate finding, it is a “hygiene” issue as opposed to indication of attack

III: Security Interesting Events Focused anomalies. Requires user feedback and supervised signals to provide more focused analysis. May be based on user preferences, and different for each user. Anomalies plus intrinsic and extrinsic signals.

Additional domain information may be domain expert annotations, feedback, or prioritized heuristics.

There is a strong partnership with Machine Learning and Security Experts Domain Expert: Provides feedback on the quality of the results

Anomaly Detection System: Uses Active learning systems on top of traditional anomaly detection systems to capture the domain expert’s feedback to improve the system

What next?

  • Use the framework listed above to identify if your alerts are outliers, anomalies or security interesting. You can then use the grid to evolve your results to the end goal of actionable, credible and interpretable results.
  • So what does an algorithm that generates useful security alerts look like, and how does one go about building such an algorithm? Stay tuned for additional blog posts delving into these topics.


[1] http://www.ponemon.org/local/upload/file/Damballa%20Malware%20Containment%20FINAL%203.pdf

Skip to main content