Black-Box Anomaly Attribution

Tsuyoshi Idé; Naoki Abe

Black-Box Anomaly Attribution

Tsuyoshi Idé, Naoki Abe

TL;DR

The paper tackles anomaly attribution for doubly black-box regression models by reframing explanations around the deviation of the prediction, rather than the raw local behavior. It critiques mainstream function-based attribution methods (IG, SV, LIME) as deviation-agnostic and derives their connections, including SV=EIG, within the integrated gradient framework. The authors propose likelihood compensation (LC), a principled, local perturbation approach that maximizes the likelihood $p(y\mid\bm{x})$ to produce a counterfactual input $\bm{x}+\bm{\delta}$ that would render the observation less anomalous, providing interpretable, sparse, and deviation-aware attributions. They validate LC on multiple datasets and a real building energy management use-case, showing improved interpretability, stability, and actionable insights compared with baseline methods. The work offers a practically impactful framework for model-agnostic anomaly attribution with clear theoretical connections to existing attribution tools and a concrete path to deployment in industrial settings.

Abstract

When the prediction of a black-box machine learning model deviates from the true observation, what can be said about the reason behind that deviation? This is a fundamental and ubiquitous question that the end user in a business or industrial AI application often asks. The deviation may be due to a sub-optimal black-box model, or it may be simply because the sample in question is an outlier. In either case, one would ideally wish to obtain some form of attribution score -- a value indicative of the extent to which an input variable is responsible for the anomaly. In the present paper we address this task of ``anomaly attribution,'' particularly in the setting in which the model is black-box and the training data are not available. Specifically, we propose a novel likelihood-based attribution framework we call the ``likelihood compensation (LC),'' in which the responsibility score is equated with the correction on each input variable needed to attain the highest possible likelihood. We begin by showing formally why mainstream model-agnostic explanation methods, such as the local linear surrogate modeling and Shapley values, are not designed to explain anomalies. In particular, we show that they are ``deviation-agnostic,'' namely, that their explanations are blind to the fact that there is a deviation in the model prediction for the sample of interest. We do this by positioning these existing methods under the unified umbrella of a function family we call the ``integrated gradient family.'' We validate the effectiveness of the proposed LC approach using publicly available data sets. We also conduct a case study with a real-world building energy prediction task and confirm its usefulness in practice based on expert feedback.

Black-Box Anomaly Attribution

TL;DR

to produce a counterfactual input

that would render the observation less anomalous, providing interpretable, sparse, and deviation-aware attributions. They validate LC on multiple datasets and a real building energy management use-case, showing improved interpretability, stability, and actionable insights compared with baseline methods. The work offers a practically impactful framework for model-agnostic anomaly attribution with clear theoretical connections to existing attribution tools and a concrete path to deployment in industrial settings.

Abstract

Paper Structure (51 sections, 5 theorems, 49 equations, 15 figures, 4 tables, 2 algorithms)

This paper contains 51 sections, 5 theorems, 49 equations, 15 figures, 4 tables, 2 algorithms.

Introduction
Related Work
General background: Anomaly attribution in doubly black-box setting
Local linear modeling, Shapley value (SV), and integrated gradient (IG)
Unified attribution framework
Problem Setting
Notation
Limitations of Function-Based Anomaly Attribution
Deviation-agnostic property of integrated gradient
Definitions
IG and EIG for deviation
Lower-order approximations
Sum rules
Deviation-agnostic property of Shapley value
Definition
...and 36 more sections

Key Result

Theorem 1

IG and EIG are deviation-agnostic.

Figures (15)

Figure 1: Problem setting and motivation. (a) Given a black-box regression model and anomalous sample(s), our goal is to quantify input variables' responsibility without using training data. (b) Existing attribution methods attempt to explain either the local gradient or the increment from a reference point $\bm{x}^0$, rather than the deviation.
Figure 2: Illustration of likelihood compensation (LC). (a) For a given test sample $(y^t,\bm{x}^t)$, LC seeks a perturbation $\bm{\delta}$ that achieves the best possible fit with the black-box regression model $f(\bm{x})$. (b) The iterative updates Eqs. \ref{['eq:delta_solution']}-\ref{['eq:phi_delta']} converge when the deviation or the (smoothed) gradient vanishes. See Sec. \ref{['subsec:optimization']} for more details.
Figure 3: 2D Sinusoidal Curve with the $x_2=0$ slice on the top. The points A and B are at $y^t=1$ and $-1$, respectively, while they are at the same $\bm{x}^t =(0.5,0)$.
Figure 4: Comparison of attribution scores on the 2D Sinusoidal Curve at two test points (A and B in Fig. \ref{['fig:2D_sinusoidal']}). The scores were evaluated 10 times over randomly generated datasets. Only the LC scores differ between A and B.
Figure 5: Boston Housing: Pairwise scatter plot between $y$ (MEDV) and selected input variables. The square and triangle show the detected first and second outliers in the test data.
...and 10 more figures

Theorems & Definitions (12)

Definition 1: anomaly attribution
Theorem 1
proof
proof
Theorem 2
proof
proof
Theorem 3
proof
Theorem 4: Equivalence of SV to EIG
...and 2 more

Black-Box Anomaly Attribution

TL;DR

Abstract

Black-Box Anomaly Attribution

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (12)