DTOR: Decision Tree Outlier Regressor to explain anomalies

Riccardo Crupi; Daniele Regoli; Alessandro Damiano Sabatino; Immacolata Marano; Massimiliano Brinis; Luca Albertazzi; Andrea Cirillo; Andrea Claudio Cosentini

DTOR: Decision Tree Outlier Regressor to explain anomalies

Riccardo Crupi, Daniele Regoli, Alessandro Damiano Sabatino, Immacolata Marano, Massimiliano Brinis, Luca Albertazzi, Andrea Cirillo, Andrea Claudio Cosentini

TL;DR

DTOR addresses the need for interpretable explanations of anomaly scores in banking by learning to approximate an anomaly detector's output with a weighted $DecisionTreeRegressor$ and then extracting a concise explanatory rule path for the target datapoint. It introduces formal notions of $\text{precision}(A_x)$, $\text{coverage}(A_x)$, and $\text{validity}(A_x)$ and a neighborhood-sampling method to preserve correlations, enabling locally faithful explanations. In extensive experiments across multiple datasets and detectors, DTOR delivers competitive or superior explanations compared to Anchors, with faster rule discovery and better applicability to regression-style anomaly scores. The approach provides practical, human-interpretable insights for internal banking audits and fraud countermeasures, with potential for broader adoption in anomaly explainability tasks.

Abstract

Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.

DTOR: Decision Tree Outlier Regressor to explain anomalies

TL;DR

DTOR addresses the need for interpretable explanations of anomaly scores in banking by learning to approximate an anomaly detector's output with a weighted

and then extracting a concise explanatory rule path for the target datapoint. It introduces formal notions of

, and

and a neighborhood-sampling method to preserve correlations, enabling locally faithful explanations. In extensive experiments across multiple datasets and detectors, DTOR delivers competitive or superior explanations compared to Anchors, with faster rule discovery and better applicability to regression-style anomaly scores. The approach provides practical, human-interpretable insights for internal banking audits and fraud countermeasures, with potential for broader adoption in anomaly explainability tasks.

Abstract

Paper Structure (8 sections, 4 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 8 sections, 4 equations, 2 figures, 3 tables, 2 algorithms.

Introduction
Anomaly detection in the Banking sector - Internal Audit activity
eXplainable AI for anomaly detection
Method
Experiments
Datasets and AD models
Rule-based XAI
Discussion and conclusion

Figures (2)

Figure 1: A simplified illustration of synthetic data generation is presented. Initially, samples from the original dataset are selected based on sub-rules (e.g., $x_1>2$ or $x_2<0$ in the given example). Subsequently, $N_{\text{gen}}$ samples are drawn for each variable to satisfy the overarching rule $A$. Notably, the image does not depict the discretization of continuous variables or the preservation of inter-variable correlations. However, for illustrative purposes, it is evident that negative values of $x_3$ do not occur under rule $A$, as observed in the synthetic dataset.
Figure 2: Illustration of a machine learning application where the XAI method can provide explanations either in the original input space or the pre-processed one. If the latter option is chosen, the explanation must be converted back into the original feature space, particularly when a rule-based explanation is expected.

DTOR: Decision Tree Outlier Regressor to explain anomalies

TL;DR

Abstract

DTOR: Decision Tree Outlier Regressor to explain anomalies

Authors

TL;DR

Abstract

Table of Contents

Figures (2)