Table of Contents
Fetching ...

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

Jacopo Teneggi, Jeremias Sulam

TL;DR

This paper formalizes the global and local statistical importance of semantic concepts for the predictions of opaque models by means of conditional independence, which allows for rigorous testing and showcases the effectiveness and flexibility of the framework on synthetic datasets as well as on image classification tasks using several and diverse vision-language models.

Abstract

Recent works have extended notions of feature importance to semantic concepts that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate and false discovery rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the global (i.e., over a population) and local (i.e., for a sample) statistical importance of semantic concepts for the predictions of opaque models by means of conditional independence, which allows for rigorous testing. We use recent ideas of sequential kernelized independence testing (SKIT) to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework on synthetic datasets as well as on image classification tasks using several and diverse vision-language models.

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

TL;DR

This paper formalizes the global and local statistical importance of semantic concepts for the predictions of opaque models by means of conditional independence, which allows for rigorous testing and showcases the effectiveness and flexibility of the framework on synthetic datasets as well as on image classification tasks using several and diverse vision-language models.

Abstract

Recent works have extended notions of feature importance to semantic concepts that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate and false discovery rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the global (i.e., over a population) and local (i.e., for a sample) statistical importance of semantic concepts for the predictions of opaque models by means of conditional independence, which allows for rigorous testing. We use recent ideas of sequential kernelized independence testing (SKIT) to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework on synthetic datasets as well as on image classification tasks using several and diverse vision-language models.
Paper Structure (56 sections, 10 theorems, 69 equations, 33 figures, 9 tables, 5 algorithms)

This paper contains 56 sections, 10 theorems, 69 equations, 33 figures, 9 tables, 5 algorithms.

Key Result

Lemma 1

Let ${\hat{Y}} = \langle w,H\rangle$, $w \in {\mathbb R}^d$. If $d \geq 3$, then ${H^{\text{G}}_{0,j}}~\text{is true} {\centernot\iff} \langle w,c_j\rangle = 0$.

Figures (33)

  • Figure 1: Overview of the problem setup and our contribution.
  • Figure 2: Pictorial representation of the data-generating process for the synthetic dataset.
  • Figure 3: Global importance results for $H_0: Y \perp \!\!\! \perp Z_2$ with SKIT. \ref{['fig:synthetic_global_dist']} Marginal distributions of $Y$ and $Z_2$ for $\beta_2 = 1$ and $0$, respectively. The red dashed line is the linear regression between the two variables, and, as expected, the slope is $\approx 0$ for $\beta_2 = 0$. \ref{['fig:synthetic_global_results']} Mean rejection rate and mean rejection time for SKIT with a linear and RBF kernel, as a function of $\beta_2$.
  • Figure 4: Global conditional importance results for $H_0: Y \perp \!\!\! \perp Z_1 \mid Z_{-1}$ with ${\textsc{c-}\text{SKIT}}$. \ref{['fig:synthetic_global_cond_dist']}${\widetilde{Z}}_1 \sim P_{Z_1 | Z_{-1}}$ is independent of $Y$ for $Z_{-1}=[-1,3]$. As expected, the slope of the linear regression between $Y$ and ${\widetilde{Z}}_1$ is $\approx 0$. \ref{['fig:synthetic_global_cond_results']} Mean rejection rate and mean rejection time for ${\textsc{c-}\text{SKIT}}$ with a linear and RBF kernel, as a function of $\beta_1$.
  • Figure 5: Local conditional importance results for $H_0: g({\widetilde{Z}}_{\{2,3\}}) \overset{d}{=} g({\widetilde{Z}}_3)$ with ${\textsc{x-}\text{SKIT}}$. \ref{['fig:synthetic_local_cond_dist']} Shows that, as expected, the test and null distributions overlap when $z_3 = 0$.
  • ...and 28 more figures

Theorems & Definitions (24)

  • Definition 1: Global semantic importance
  • Lemma 1
  • Definition 2: Global conditional semantic importance
  • Definition 3: Local conditional semantic importance
  • Lemma 2: See shaer2023modelshekhar2023nonparametric
  • Lemma 3: See podkopaev2023sequentialshaer2023modelshekhar2023nonparametric
  • Proposition 1: Validity of ${\textsc{c-}\text{SKIT}}$
  • Proposition 2: Validity of ${\textsc{x-}\text{SKIT}}$
  • Lemma 4: See wang2022false
  • Definition A.1: Test martingale
  • ...and 14 more