Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

Harrie Oosterhuis; Lijun Lyu; Avishek Anand

Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

Harrie Oosterhuis, Lijun Lyu, Avishek Anand

TL;DR

The paper tackles faithfulness in local, instance-wise feature explanations by formalizing leakage into label leakage and feature leakage and deriving necessary and sufficient conditions for no leakage. It introduces two leakage-free approaches: a linear programming solution for toy, fully specified distributions and SUWR (Sequential Unmasking without Reversion) for scalable, leakage-free feature selection with narrative explanations. SUWR provably avoids leaking information from non-selected features or labels and, via reinforcement learning, achieves competitive predictive performance with high feature-sparsity across synthetic and image benchmarks, while providing step-by-step interpretability. Empirical results demonstrate that existing local feature selectors exhibit leakage and can overfit, whereas SUWR delivers faithful explanations and strong accuracy with concise explanations, with code made publicly available.

Abstract

Local feature selection in machine learning provides instance-specific explanations by focusing on the most relevant features for each prediction, enhancing the interpretability of complex models. However, such methods tend to produce misleading explanations by encoding additional information in their selections. In this work, we attribute the problem of misleading selections by formalizing the concepts of label and feature leakage. We rigorously derive the necessary and sufficient conditions under which we can guarantee no leakage, and show existing methods do not meet these conditions. Furthermore, we propose the first local feature selection method that is proven to have no leakage called SUWR. Our experimental results indicate that SUWR is less prone to overfitting and combines state-of-the-art predictive performance with high feature-selection sparsity. Our generic and easily extendable formal approach provides a strong theoretical basis for future work on interpretability with reliable explanations.

Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

TL;DR

Abstract

Paper Structure (24 sections, 4 theorems, 58 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 24 sections, 4 theorems, 58 equations, 8 figures, 3 tables, 2 algorithms.

Introduction
Brief related work
Leakage in Feature Selection
Formalization of label leakage in feature selection
Formalizing feature leakage in feature selection
The necessary and sufficient conditions for leakage
A Linear Programming Solution
Sequential Unmasking without Reversion
Feature selection inference with SUWR
Optimization of SUWR feature selection policies
Discussion
Experiment 1: Pareto Front Analysis
Experiment 2: Synthetic Benchmark
Experiment 3: MNIST Digits and Fashion
Conclusion
...and 9 more sections

Key Result

Corollary 2.5

A feature selector does not have leakage if and only if every probability for every possible feature selection does not depend on any label values or any non-selected feature values:

Figures (8)

Figure 1: Performance curves of the first experiment. Grey area indicates performance that is impossible without leakage.
Figure 2: Several selection masks produced by SUWR for different fashion items from fashion-MNIST. Red squares indicate selected patches, the numbers shown inside indicate at what step each patch was selected. All items were correctly classified by SUWR.
Figure 3: Narrative explanations derived from the SUWR inference process for a sandal (top) and boot (bottom) from fashion-MNIST. Step $t=2$ up to $t=5$ are visualized, red squares indicate patches selected in that step, blue squares those selected in previous steps.
Figure 4: Results on MNIST: digits (left) and fashion (right).
Figure 5: Visualization of all possible steps and transitions for a RDHD SUWR policy when selecting from a set of three features.
...and 3 more figures

Theorems & Definitions (14)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Corollary 2.5
proof
Theorem 2.3
proof
Theorem 2.4
proof
...and 4 more

Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

TL;DR

Abstract

Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (14)