The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

Wenqian Ye; Luyang Jiang; Eric Xie; Guangtao Zheng; Yunsheng Ma; Xu Cao; Dongliang Guo; Daiqing Qi; Zeyu He; Yijun Tian; Megan Coffee; Zhe Zeng; Sheng Li; Ting-hao; Huang; Ziran Wang; James M. Rehg; Henry Kautz; Aidong Zhang

The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

Wenqian Ye, Luyang Jiang, Eric Xie, Guangtao Zheng, Yunsheng Ma, Xu Cao, Dongliang Guo, Daiqing Qi, Zeyu He, Yijun Tian, Megan Coffee, Zhe Zeng, Sheng Li, Ting-hao, Huang, Ziran Wang, James M. Rehg, Henry Kautz, Aidong Zhang

TL;DR

Spurious correlations limit model robustness under distribution shifts, akin to Clever Hans relying on cues rather than true signals. The paper formalizes these correlations, presents a fine-grained taxonomy of data-centric, representation-learning, post-hoc, and specialized mitigation methods, and catalogs datasets and metrics across domains. It offers theoretical and practical insights, discusses broader impacts in healthcare and embodied AI, and outlines open challenges and directions for scalable, interpretable, cross-domain debiasing in the era of generative and foundation models. The work aims to unify methodologies, benchmarks, and evaluation strategies to advance robust, fair, and reliable AI systems.

Abstract

Back in the early 20th century, a horse named Hans appeared to perform arithmetic and other intellectual tasks during exhibitions in Germany, while it actually relied solely on involuntary cues in the body language from the human trainer. Modern machine learning models are no different. These models are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. Such features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a comprehensive survey of this emerging issue, along with a fine-grained taxonomy of existing state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to facilitate future research. The paper concludes with a discussion of the broader impacts, the recent advancements, and future challenges in the era of generative AI, aiming to provide valuable insights for researchers in the related domains of the machine learning community.

The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

TL;DR

Abstract

Paper Structure (51 sections, 6 equations, 6 figures, 1 table)

This paper contains 51 sections, 6 equations, 6 figures, 1 table.

Introduction
Spurious Correlations
Where do Spurious Correlations Come From?
Why Are Machine Learning Models Sensitive to Spurious Correlations?
Inductive Biases from Learning Algorithms.
Optimization.
Theoretical Insights
Related Areas
Methods
Data-Centric Methods
Data Augmentation
Data Balancing
Concept and Pseudo-label Discovery
Representation Learning
Causal Intervention
...and 36 more sections

Figures (6)

Figure 1: An illustration of the Clever Hans effect as an analogy for spurious correlations in machine learning. Just as Clever Hans appeared to solve arithmetic problems by responding to subtle cues from his trainer, machine learning models can achieve high accuracy by exploiting spurious features, e.g., associating grass with cows rather than learning true underlying concepts. (Image generated by GPT-4o)
Figure 2: Depiction of a spurious correlation between spurious attribute $A$ (e.g., grass) and target $Y$ (e.g., cow). In environment $E$, input $X$ contains both core features (invariant across environments) from $Y$, and spurious features from $A$. When shifting to a new environment $E'$, the spurious attribute changes (e.g., desert background $A'$), causing the spurious correlation to break. The core features remain predictive of $Y$, but models trained on $E$ may fail to generalize if they overly rely on $A$.
Figure 3: A comprehensive taxonomy of approaches to address spurious correlation in machine learning.
Figure 4: Example of sycophancy as spurious attributes in an LLM-based reward model.
Figure 5: Hospital tags, strips, and medical devices exemplify several unknown group labels in the MIMIC-CXR dataset, which can spuriously correlate with the ground truth diagnosis results.
...and 1 more figures

Theorems & Definitions (1)

Definition 2.1: Spurious Correlation

The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

TL;DR

Abstract

The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)