Table of Contents
Fetching ...

Inherently Interpretable and Uncertainty-Aware Models for Online Learning in Cyber-Security Problems

Benjamin Kolicic, Alberto Caron, Chris Hicks, Vasilios Mavroudis

TL;DR

This work proposes a novel pipeline for online supervised learning problems in cyber-security, that harnesses the inherent interpretability and uncertainty awareness of Additive Gaussian Processes (AGPs) models.

Abstract

In this paper, we address the critical need for interpretable and uncertainty-aware machine learning models in the context of online learning for high-risk industries, particularly cyber-security. While deep learning and other complex models have demonstrated impressive predictive capabilities, their opacity and lack of uncertainty quantification present significant questions about their trustworthiness. We propose a novel pipeline for online supervised learning problems in cyber-security, that harnesses the inherent interpretability and uncertainty awareness of Additive Gaussian Processes (AGPs) models. Our approach aims to balance predictive performance with transparency while improving the scalability of AGPs, which represents their main drawback, potentially enabling security analysts to better validate threat detection, troubleshoot and reduce false positives, and generally make trustworthy, informed decisions. This work contributes to the growing field of interpretable AI by proposing a class of models that can be significantly beneficial for high-stake decision problems such as the ones typical of the cyber-security domain. The source code is available.

Inherently Interpretable and Uncertainty-Aware Models for Online Learning in Cyber-Security Problems

TL;DR

This work proposes a novel pipeline for online supervised learning problems in cyber-security, that harnesses the inherent interpretability and uncertainty awareness of Additive Gaussian Processes (AGPs) models.

Abstract

In this paper, we address the critical need for interpretable and uncertainty-aware machine learning models in the context of online learning for high-risk industries, particularly cyber-security. While deep learning and other complex models have demonstrated impressive predictive capabilities, their opacity and lack of uncertainty quantification present significant questions about their trustworthiness. We propose a novel pipeline for online supervised learning problems in cyber-security, that harnesses the inherent interpretability and uncertainty awareness of Additive Gaussian Processes (AGPs) models. Our approach aims to balance predictive performance with transparency while improving the scalability of AGPs, which represents their main drawback, potentially enabling security analysts to better validate threat detection, troubleshoot and reduce false positives, and generally make trustworthy, informed decisions. This work contributes to the growing field of interpretable AI by proposing a class of models that can be significantly beneficial for high-stake decision problems such as the ones typical of the cyber-security domain. The source code is available.

Paper Structure

This paper contains 14 sections, 11 equations, 9 figures.

Figures (9)

  • Figure 1: Simple example of Gaussian Process fit one-dimensional input space. The red line denotes the mean fit for the $f(x)$ function, while the grey bars denotes the 95% confidence interval around $f(x)$. Blue dots depict training data points. Notice how the confidence is very high in regions dense with data points, while is very low in regions with no data points, demonstrating the desirable uncertainty-quantification properties of GPs.
  • Figure 2: Example of Gaussian Process fit on a two-dimensional inputs case. The 3D plot on the left depicts the mean fit for $f(x_1, x_2)$, while the contour plot on the right depicts the variance around $f(x_1, x_2)$. Variance is higher (brighter colour) in regions at the corners with fewer data points.
  • Figure 3: Architecture of a NAM: each input is modelled via an input-specific fully-connected MLP (sub-networks) that guarantee interpretability of their contribution $f_j(\cdot)$ (Shapley value shapley:book1952) to the final output $y$. Their functions are then summed up and normalized to construct the final predictor.
  • Figure 4: Architecture of an Additive GP: each data feature is fed into a Multidimensional GP, then individual contributions are extracted, summed and normalised.
  • Figure 5: ROC curves of the four models considered: Neural Nets (NN), Gaussian Processes (GP), Neural Additive Models (NAM) and Additive GP (AGP). We used a window of 20% (of the data) for routinely re-training the models.
  • ...and 4 more figures