Table of Contents
Fetching ...

Sufficient and Necessary Explanations (and What Lies in Between)

Beepul Bharti, Paul Yi, Jeremias Sulam

TL;DR

This work formalize and study two precise notions of feature importance for general machine learning models: sufficiency and necessity, and proposes a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis.

Abstract

As complex machine learning models continue to find applications in high-stakes decision-making scenarios, it is crucial that we can explain and understand their predictions. Post-hoc explanation methods provide useful insights by identifying important features in an input $\mathbf{x}$ with respect to the model output $f(\mathbf{x})$. In this work, we formalize and study two precise notions of feature importance for general machine learning models: sufficiency and necessity. We demonstrate how these two types of explanations, albeit intuitive and simple, can fall short in providing a complete picture of which features a model finds important. To this end, we propose a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis. Our unified notion, we show, has strong ties to other popular definitions of feature importance, like those based on conditional independence and game-theoretic quantities like Shapley values. Crucially, we demonstrate how a unified perspective allows us to detect important features that could be missed by either of the previous approaches alone.

Sufficient and Necessary Explanations (and What Lies in Between)

TL;DR

This work formalize and study two precise notions of feature importance for general machine learning models: sufficiency and necessity, and proposes a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis.

Abstract

As complex machine learning models continue to find applications in high-stakes decision-making scenarios, it is crucial that we can explain and understand their predictions. Post-hoc explanation methods provide useful insights by identifying important features in an input with respect to the model output . In this work, we formalize and study two precise notions of feature importance for general machine learning models: sufficiency and necessity. We demonstrate how these two types of explanations, albeit intuitive and simple, can fall short in providing a complete picture of which features a model finds important. To this end, we propose a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis. Our unified notion, we show, has strong ties to other popular definitions of feature importance, like those based on conditional independence and game-theoretic quantities like Shapley values. Crucially, we demonstrate how a unified perspective allows us to detect important features that could be missed by either of the previous approaches alone.
Paper Structure (34 sections, 5 theorems, 44 equations, 5 figures)

This paper contains 34 sections, 5 theorems, 44 equations, 5 figures.

Key Result

Lemma 4.1

Let $\alpha \in (0,1)$. For $\tau >0$, denote $S^*$ to be a solution to eq:uni_opt for which $\Delta^{{\text{uni}}}_{\mathcal{V}}(S, f, {\bf x}, \alpha)= \epsilon$. Then, $S^*$ is $\frac{\epsilon}{\alpha}$-sufficient and $\frac{\epsilon}{1-\alpha}$-necessary. Formally,

Figures (5)

  • Figure 1: Stability of solutions to \ref{['eq:uni_opt']} vs. $\alpha$ for $\tau \in \{3, 6, 9\}$
  • Figure 2: Experimental results on the RSNA dataset.
  • Figure 3: Comparison of different methods on the CelebAHQ dataset.
  • Figure 4: Images and model predictions by fixing and masking the sufficient subset $S^*_{{\text{suf}}}$
  • Figure 5: Images and model predictions by fixing and masking the necessary subset $S^*_{{\text{nec}}}$

Theorems & Definitions (12)

  • Definition 2.1: Sufficiency
  • Definition 2.2: Necessity
  • Lemma 4.1
  • Lemma 4.2
  • Theorem 4.1
  • Corollary 5.1
  • Theorem 5.1
  • proof
  • proof
  • proof
  • ...and 2 more