Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

Pablo Barceló; Alexander Kozachinskiy; Miguel Romero Orth; Bernardo Subercaseaux; José Verschae

Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

Pablo Barceló, Alexander Kozachinskiy, Miguel Romero Orth, Bernardo Subercaseaux, José Verschae

TL;DR

The paper investigates how to explain $k$-NN classifications from a feature-centric perspective using abductive (minimum sufficient reasons) and counterfactual explanations. It conducts a thorough complexity analysis across continuous $(\mathbb{R},D_p)$ and discrete $(\{0,1\}^n,D_H)$ settings, showing NP-hardness for minimum sufficient reasons in all settings and revealing a separation: tractable $\ell_2$-based tasks versus harder $\ell_1$-based and discrete $k\ge3$ tasks. It provides polynomial-time algorithms for several $\ell_2$-distance problems and proves NP-hardness for $\ell_1$-based counterfactuals, with a precise map of when explanations are tractable. The authors also demonstrate practical computation via Integer Quadratic Programming and SAT encodings, and validate the approach on MNIST and synthetic data, illustrating feasibility for hundreds of features. Overall, the work clarifies when feature-based explanations for $k$-NN are tractable and offers concrete algorithms for real-world explainability tasks.

Abstract

Despite the wide use of $k$-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a "data perspective", in which the classification of an input vector $\bar{x}$ is explained by identifying the vectors $\bar{v}_1, \ldots, \bar{v}_k$ in the training set that determine the classification of $\bar{x}$, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a "feature perspective", in which the goal is to identify how the values of the features in $\bar{x}$ affect its classification. Concretely, we study abductive explanations such as "minimum sufficient reasons", which correspond to sets of features in $\bar{x}$ that are enough to guarantee its classification, and "counterfactual explanations" based on the minimum distance feature changes one would have to perform in $\bar{x}$ to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.

Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

TL;DR

The paper investigates how to explain

-NN classifications from a feature-centric perspective using abductive (minimum sufficient reasons) and counterfactual explanations. It conducts a thorough complexity analysis across continuous

and discrete

settings, showing NP-hardness for minimum sufficient reasons in all settings and revealing a separation: tractable

-based tasks versus harder

-based and discrete

tasks. It provides polynomial-time algorithms for several

-distance problems and proves NP-hardness for

-based counterfactuals, with a precise map of when explanations are tractable. The authors also demonstrate practical computation via Integer Quadratic Programming and SAT encodings, and validate the approach on MNIST and synthetic data, illustrating feasibility for hundreds of features. Overall, the work clarifies when feature-based explanations for

-NN are tractable and offers concrete algorithms for real-world explainability tasks.

Abstract

Despite the wide use of

-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a "data perspective", in which the classification of an input vector

is explained by identifying the vectors

in the training set that determine the classification of

, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a "feature perspective", in which the goal is to identify how the values of the features in

affect its classification. Concretely, we study abductive explanations such as "minimum sufficient reasons", which correspond to sets of features in

that are enough to guarantee its classification, and "counterfactual explanations" based on the minimum distance feature changes one would have to perform in

to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.

Paper Structure (32 sections, 15 theorems, 33 equations, 4 figures, 1 table)

This paper contains 32 sections, 15 theorems, 33 equations, 4 figures, 1 table.

Introduction
Nearest Neighbor classification.
Formal explainability.
Why feature-based explanations for $k$-NNs?
Context.
Our contributions.
Organization of the paper.
Definitions
Basics.
Metric spaces studied in the paper.
Nearest neighbor classification.
Problems
Decision problems
Abductive explanations
Counterfactual explanations
...and 17 more sections

Key Result

Proposition 1

(a) We have $f^k_{S^+, S^-}(\bar{x}) = 1$ if and only if there exist $A\subseteq S^+$ of size $(k+1)/2$ and $B\subseteq S^-$ of size at most $(k-1)/2$ such that $d_n(\bar{x}, \bar{a}) \le d_n(\bar{x}, \bar{c})$ for every $\bar{a} \in A$ and $\bar{c}\in S^-\setminus B$. (b) We have $f^k_{S^+, S^-}(\b

Figures (4)

Figure 1: Illustration of a counterfactual explanation for an image of digit $4$ in the binarized MNIST dataset, which after changing $13$ pixels is classified as a $9$.
Figure 2: Illustration of an minimum distance counterfactual explanation over $\mathbb{R}^2$ in the $\ell_2$ metric. Blue (red) areas are classified negatively (positively).
Figure 3: Runtimes for counterfactual explanations over $\{0, 1\}^n$. The total training set has size $N := |S^+| + |S^-|$, consisting of independent uniformly random samples from $\{0, 1\}^n$. Confidence intervals of $95\%$ over $30$ independent runs are displayed.
Figure 4: Runtimes for explanations over the MNIST dataset. The training set used has size $N := |S^+| + |S^-|$. Confidence intervals of $95\%$ over 5 independent runs are displayed.

Theorems & Definitions (28)

Example 1
Proposition 1
Example 2
Proposition 2
proof
Theorem 1
proof
Proposition 3
proof
Corollary 1
...and 18 more

Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

TL;DR

Abstract

Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (28)