Provably Explaining Neural Additive Models

Shahaf Bassan; Yizhak Yisrael Elboher; Tobias Ladner; Volkan Şahin; Jan Kretinsky; Matthias Althoff; Guy Katz

Provably Explaining Neural Additive Models

Shahaf Bassan, Yizhak Yisrael Elboher, Tobias Ladner, Volkan Şahin, Jan Kretinsky, Matthias Althoff, Guy Katz

TL;DR

This work presents a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component.

Abstract

Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a cardinally-minimal subset of input features which by itself is provably sufficient to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case exponential number of verification queries in the number of input features, each of which is NP-hard. In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can efficiently generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component. Our algorithm not only makes the task of obtaining cardinally-minimal explanations feasible, but even outperforms existing algorithms designed to find the relaxed variant of subset-minimal explanations - which may be larger and less informative but easier to compute - despite our algorithm solving a much more difficult task. Our experiments demonstrate that, compared to previous algorithms, our approach provides provably smaller explanations than existing works and substantially reduces the computation time. Moreover, we show that our generated provable explanations offer benefits that are unattainable by standard sampling-based techniques typically used to interpret NAMs.

Provably Explaining Neural Additive Models

TL;DR

Abstract

Paper Structure (40 sections, 23 theorems, 51 equations, 8 figures, 4 tables, 8 algorithms)

This paper contains 40 sections, 23 theorems, 51 equations, 8 figures, 4 tables, 8 algorithms.

Introduction
Preliminaries
Notation
Neural Network Verification
Neural Additive Models (NAMs).
Provably Sufficient Explanations for Neural Networks
Provably Cardinally-Minimal Sufficient Explanations for NAMs
Stage 1 --- Parallel Interval Importance Sorting
Stage 2 --- Feature Selection Based on the Derived Feature Intervals
Evaluation
Our Algorithm vs. Previous Algorithms
Explanation Progression in Time
Comparison to Purely Sampling-Based Methods
Related Work
Limitations
...and 25 more sections

Key Result

Proposition 1

Given a NAM $f$, an input $\textbf{x}\in\mathbb{R}^n$ and a perturbation radius $\epsilon_p\in\mathbb{R}_+$, let alg:parallel_sorting_part_1 return a total list order over the input features according to their importance. Then, the following holds: For any sufficient explanation $\mathcal{S}$ that i

Figures (8)

Figure 1: Comparison of a subset-minimal explanation and a cardinally-minimal explanation of a NAM. Both explanations (in yellow) in (b) and (c) are minimal, as perturbing any additional feature can lead the overall output to become negative.
Figure 2: Sufficient explanations in NAMs: (a) Users must examine the neighborhood of an input for proper interpretation; e.g, users might wrongly conclude that feature $1$ from the FICO HELOC dataset alone determines a positive output, but small changes in features $15$ or $18$ can flip the classification. (b) Outputs of continuous neighborhoods can be misleading if not verified, since sampling may miss extrema. For example, users might wrongly believe based on sampling that all features but feature $1$ only have positive outputs and thus fixing feature $1$ will always lead to a positive output of the entire NAM; however, the output of feature $2$ can become so negative to flip the classification.
Figure 3: Explanation size over time for all datasets.
Figure 4: The computation of the binary decomposition defined in Equation \ref{['eq:vars-qbro']}, with input $\textbf{x}$ and the output vector $\mathbf{o}_0,\ldots,\mathbf{o}_{n-1}$ representing the binary decomposition.
Figure 5: Number of processed features over time.
...and 3 more figures

Theorems & Definitions (30)

Definition 1: Sufficient Explanation
Definition 2: Minimal Sufficient Explanations
Proposition 1
Proposition 2
Proposition 3
Proposition 4
Theorem 1
Theorem 2
Proposition 1
Proposition 2
...and 20 more

Provably Explaining Neural Additive Models

TL;DR

Abstract

Provably Explaining Neural Additive Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (30)