Table of Contents
Fetching ...

DTOR: Decision Tree Outlier Regressor to explain anomalies

Riccardo Crupi, Daniele Regoli, Alessandro Damiano Sabatino, Immacolata Marano, Massimiliano Brinis, Luca Albertazzi, Andrea Cirillo, Andrea Claudio Cosentini

TL;DR

DTOR addresses the need for interpretable explanations of anomaly scores in banking by learning to approximate an anomaly detector's output with a weighted $DecisionTreeRegressor$ and then extracting a concise explanatory rule path for the target datapoint. It introduces formal notions of $\text{precision}(A_x)$, $\text{coverage}(A_x)$, and $\text{validity}(A_x)$ and a neighborhood-sampling method to preserve correlations, enabling locally faithful explanations. In extensive experiments across multiple datasets and detectors, DTOR delivers competitive or superior explanations compared to Anchors, with faster rule discovery and better applicability to regression-style anomaly scores. The approach provides practical, human-interpretable insights for internal banking audits and fraud countermeasures, with potential for broader adoption in anomaly explainability tasks.

Abstract

Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.

DTOR: Decision Tree Outlier Regressor to explain anomalies

TL;DR

DTOR addresses the need for interpretable explanations of anomaly scores in banking by learning to approximate an anomaly detector's output with a weighted and then extracting a concise explanatory rule path for the target datapoint. It introduces formal notions of , , and and a neighborhood-sampling method to preserve correlations, enabling locally faithful explanations. In extensive experiments across multiple datasets and detectors, DTOR delivers competitive or superior explanations compared to Anchors, with faster rule discovery and better applicability to regression-style anomaly scores. The approach provides practical, human-interpretable insights for internal banking audits and fraud countermeasures, with potential for broader adoption in anomaly explainability tasks.

Abstract

Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.
Paper Structure (8 sections, 4 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 8 sections, 4 equations, 2 figures, 3 tables, 2 algorithms.

Figures (2)

  • Figure 1: A simplified illustration of synthetic data generation is presented. Initially, samples from the original dataset are selected based on sub-rules (e.g., $x_1>2$ or $x_2<0$ in the given example). Subsequently, $N_{\text{gen}}$ samples are drawn for each variable to satisfy the overarching rule $A$. Notably, the image does not depict the discretization of continuous variables or the preservation of inter-variable correlations. However, for illustrative purposes, it is evident that negative values of $x_3$ do not occur under rule $A$, as observed in the synthetic dataset.
  • Figure 2: Illustration of a machine learning application where the XAI method can provide explanations either in the original input space or the pre-processed one. If the latter option is chosen, the explanation must be converted back into the original feature space, particularly when a rule-based explanation is expected.