Table of Contents
Fetching ...

Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition

Barproda Halder, Faisal Hamman, Pasan Dissanayake, Qiuyi Zhang, Ilia Sucholutsky, Sanghamitra Dutta

TL;DR

The paper tackles spurious correlations in datasets by introducing a formal, information-theoretic framework based on Partial Information Decomposition (PID). It proposes Spurious Disentangler, an autoencoder-based pipeline that segments data into core and spurious features, reduces dimensionality, and estimates PID terms to derive a spuriousness measure $M_{sp}$ prior to training. Across six benchmark datasets, $M_{sp}$ correlates with post-training worst-group accuracy, supporting its use as a pre-training data-quality indicator and enabling dataset auditing without heavy adversarial training. The work provides a principled lens on feature informativeness, offering a practical tool to mitigate spuriousness in high-dimensional settings.

Abstract

Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. In this work, we propose an explainability framework to preemptively disentangle the nature of such spurious associations in a dataset before model training. We leverage a body of work in information theory called Partial Information Decomposition (PID) to decompose the total information about the target into four non-negative quantities, namely unique information (in core and spurious features, respectively), redundant information, and synergistic information. Our framework helps anticipate when the core or spurious feature is indispensable, when either suffices, and when both are jointly needed for an optimal classifier trained on the dataset. Next, we leverage this decomposition to propose a novel measure of the spuriousness of a dataset. We arrive at this measure systematically by examining several candidate measures, and demonstrating what they capture and miss through intuitive canonical examples and counterexamples. Our framework Spurious Disentangler consists of segmentation, dimensionality reduction, and estimation modules, with capabilities to specifically handle high-dimensional image data efficiently. Finally, we also perform empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across $6$ benchmark datasets under various experimental settings. We observe an agreement between our preemptive measure of dataset spuriousness and post-training model generalization metrics such as worst-group accuracy, further supporting our proposition. The code is available at https://github.com/Barproda/spuriousness-disentangler.

Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition

TL;DR

The paper tackles spurious correlations in datasets by introducing a formal, information-theoretic framework based on Partial Information Decomposition (PID). It proposes Spurious Disentangler, an autoencoder-based pipeline that segments data into core and spurious features, reduces dimensionality, and estimates PID terms to derive a spuriousness measure prior to training. Across six benchmark datasets, correlates with post-training worst-group accuracy, supporting its use as a pre-training data-quality indicator and enabling dataset auditing without heavy adversarial training. The work provides a principled lens on feature informativeness, offering a practical tool to mitigate spuriousness in high-dimensional settings.

Abstract

Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. In this work, we propose an explainability framework to preemptively disentangle the nature of such spurious associations in a dataset before model training. We leverage a body of work in information theory called Partial Information Decomposition (PID) to decompose the total information about the target into four non-negative quantities, namely unique information (in core and spurious features, respectively), redundant information, and synergistic information. Our framework helps anticipate when the core or spurious feature is indispensable, when either suffices, and when both are jointly needed for an optimal classifier trained on the dataset. Next, we leverage this decomposition to propose a novel measure of the spuriousness of a dataset. We arrive at this measure systematically by examining several candidate measures, and demonstrating what they capture and miss through intuitive canonical examples and counterexamples. Our framework Spurious Disentangler consists of segmentation, dimensionality reduction, and estimation modules, with capabilities to specifically handle high-dimensional image data efficiently. Finally, we also perform empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across benchmark datasets under various experimental settings. We observe an agreement between our preemptive measure of dataset spuriousness and post-training model generalization metrics such as worst-group accuracy, further supporting our proposition. The code is available at https://github.com/Barproda/spuriousness-disentangler.
Paper Structure (28 sections, 11 theorems, 27 equations, 24 figures, 8 tables, 1 algorithm)

This paper contains 28 sections, 11 theorems, 27 equations, 24 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

For a given data distribution, the total predictive power of the spurious features $B$ and core features $F$ about the target variable $Y$ can be decomposed into four non-negative components:

Figures (24)

  • Figure 1: Spuriousness in Waterbird dataset due to sampling bias.
  • Figure 2: $\mathrm{I}({Y;A, B})$ is decomposed into four non-negative terms.
  • Figure 3: Canonical examples distilling four types of statistical dependencies involving core and spurious features when any one PID term is dominant and its effect on the Bayes optimal classifier. In the first two cases, unique information in either $F$ or $B$ is dominant, and they are indispensable to the optimal classifier. When redundant information is dominant, the optimal classifier can pick either $F$ or $B$ without preference. The fourth scenario is interesting, where $B$ is independent of the label $Y$, and yet it contributes to the optimal classifier along with $F$.
  • Figure 4: Blackwell Sufficiency
  • Figure 5: Spuriousness Disentangler: An autoencoder-based explainability framework to handle high dimensional continuous image data with 3 modules: (i) Segmentation of images into background (spurious features) and foreground (core features); (ii) Dimensionality reduction involving an autoencoder with bottleneck and clustering; and (iii) Estimation of the joint distribution followed by the computation of PID values through convex optimization and computing $M_{sp}$.
  • ...and 19 more figures

Theorems & Definitions (18)

  • Definition 1: Unique Information bertschinger2014quantifying
  • Proposition 1: Proposed Disentanglement
  • Definition 2: Blackwell Sufficiency blackwell1953equivalent
  • Theorem 1: Interpretability Insights from Unique Information
  • Lemma 1: Redundancy
  • Lemma 2: Synergy
  • Lemma 3
  • Proposition 2: Measure of Spuriousness $M_{sp}$
  • Lemma 4: Nonnegativity of PID
  • Lemma 5: Monotonicity under local operations on $B$
  • ...and 8 more