Table of Contents
Fetching ...

Should Bias be Eliminated? A General Framework to Use Bias for OOD Generalization

Yan Li, Yunlong Deng, Zijian Li, Anpeng Wu, Zeyu Tang, Kun Zhang, Guangyi Chen

TL;DR

The paper addresses OOD generalization by reframing bias as a potential resource rather than a nuisance. It introduces a generative, causal-informed framework that disentangles content ${c}$ from bias ${b}$, then leverages ${b}$ through an environment-routing mechanism and an adaptive label prior to improve predictions under domain and label shifts. Theoretical results establish block-wise identifiability of ${c}$ and ${b}$ and show when bias can contribute to prediction via unblocked causal paths to ${y}$; empirically, BAG outperforms invariance-only baselines and prior bias-utilization methods on synthetic data and standard DG benchmarks. This approach offers a principled way to harness bias for robust, transferable models, with practical implications for real-world deployments where domain and label distributions shift.

Abstract

Most approaches to out-of-distribution (OOD) generalization learn domain-invariant representations by discarding contextual bias. In this paper, we raise a critical question: Should bias be eliminated? If not, is there a general way to leverage bias for better OOD generalization? To answer these questions, we first provide a theoretical analysis that characterizes the circumstances in which biased features contribute positively. Although theoretical results show that bias may sometimes play a positive role, leveraging it effectively is non-trivial, since its harmful and beneficial components are often entangled. Recent advances have sought to refine the prediction of bias by presuming reliable predictions from invariant features. However, such assumptions may be too strong in the real world, especially when the target also shifts from training to testing domains. Motivated by this challenge, we introduce a framework to leverage bias in a more general scenario. Specifically, we employ a generative model to capture the data generation process and identify the underlying bias factors, which are then used to construct a bias-aware predictor. Since the bias-aware predictor may shift across environments, we first estimate the environment state to train predictors under different environments, combining them as a mixture of domain experts for the final prediction. Then, we build a general invariant predictor, which can be invariant under label shift to guide the adaptation of the bias-aware predictor. Evaluations on synthetic data and standard domain generalization benchmarks demonstrate that our method consistently outperforms both invariance only baselines, recent bias utilization approaches and advanced baselines, yielding improved robustness and adaptability.

Should Bias be Eliminated? A General Framework to Use Bias for OOD Generalization

TL;DR

The paper addresses OOD generalization by reframing bias as a potential resource rather than a nuisance. It introduces a generative, causal-informed framework that disentangles content from bias , then leverages through an environment-routing mechanism and an adaptive label prior to improve predictions under domain and label shifts. Theoretical results establish block-wise identifiability of and and show when bias can contribute to prediction via unblocked causal paths to ; empirically, BAG outperforms invariance-only baselines and prior bias-utilization methods on synthetic data and standard DG benchmarks. This approach offers a principled way to harness bias for robust, transferable models, with practical implications for real-world deployments where domain and label distributions shift.

Abstract

Most approaches to out-of-distribution (OOD) generalization learn domain-invariant representations by discarding contextual bias. In this paper, we raise a critical question: Should bias be eliminated? If not, is there a general way to leverage bias for better OOD generalization? To answer these questions, we first provide a theoretical analysis that characterizes the circumstances in which biased features contribute positively. Although theoretical results show that bias may sometimes play a positive role, leveraging it effectively is non-trivial, since its harmful and beneficial components are often entangled. Recent advances have sought to refine the prediction of bias by presuming reliable predictions from invariant features. However, such assumptions may be too strong in the real world, especially when the target also shifts from training to testing domains. Motivated by this challenge, we introduce a framework to leverage bias in a more general scenario. Specifically, we employ a generative model to capture the data generation process and identify the underlying bias factors, which are then used to construct a bias-aware predictor. Since the bias-aware predictor may shift across environments, we first estimate the environment state to train predictors under different environments, combining them as a mixture of domain experts for the final prediction. Then, we build a general invariant predictor, which can be invariant under label shift to guide the adaptation of the bias-aware predictor. Evaluations on synthetic data and standard domain generalization benchmarks demonstrate that our method consistently outperforms both invariance only baselines, recent bias utilization approaches and advanced baselines, yielding improved robustness and adaptability.

Paper Structure

This paper contains 66 sections, 5 theorems, 76 equations, 5 figures, 7 tables, 2 algorithms.

Key Result

Lemma 2.1

(Block-wise Identification of ${\mathbf{c}}$ and ${\mathbf{b}}$kong2022partial). Assuming that the data generation process follows Figure fig:data_gene, and the following assumptions hold true: Then the learned $\hat{\mathbf{c}}$ and $\hat{\mathbf{b}}$ are block-wise identifiable.

Figures (5)

  • Figure 1: Illustration of data bias in the "Cow vs. Camel Classification" problem. (a) An example of the Cow vs. Camel classification task, where both cows and camels are observed in distinct backgrounds, such as grasslands and deserts. (b) A causal graph illustrating spurious correlations introduced by confounders. (c) An intuitive example demonstrating how humans classify an image with ambiguous content as either a cow or a camel.
  • Figure 2: A graphical representation of the data generation process. The content variables ${\mathbf{c}}$ are invariant to environment changes once label $y$ is given, while the bias variables ${\mathbf{b}}$ vary across environments. Observed data ${\mathbf{x}}$ is generated by $g({\mathbf{c}}, {\mathbf{b}})$, with ${\mathbf{c}}$ and ${\mathbf{b}}$ forming the latent variable ${\mathbf{z}}$. The environment $e$ affects ${\mathbf{b}}$ but not ${\mathbf{c}}$, and the target $y$ may also depend on $e$, reflecting complex data-environment interactions.
  • Figure 3: Overall framework of the BAG method. The framework consists of three main modules: representation learning, predictor training, and adaptation. In the representation learning stage, we employ a VAE to disentangle content and bias variables. For prediction, we construct a content predictor, a label prior, and a bias-aware predictor that reweights multiple domain experts using a domain estimator and also retrain under labels from content predictor in the test stage. Three predictor components work together for the final prediction.
  • Figure 4: Mean and standard deviation across different methods in binary classification simulation experiments.
  • Figure 5: PACS disentanglement via Grad-CAM. For each example, we show (left$\rightarrow$right): original image; ${\mathbf{b}}$ heatmap (blue) obtained by targeting the bias head logits; ${\mathbf{c}}$ heatmap (red) obtained by targeting the content head logits; and the composite overlay (intensity reflects attribution strength).The figure displays representative, high-separation cases where ${\mathbf{c}}$ focuses on object pixels while ${\mathbf{b}}$ emphasizes background/style cues, evidencing effective content–bias separation.

Theorems & Definitions (8)

  • Lemma 2.1
  • Definition 2.2
  • Lemma 2.3
  • Theorem 4.1: Decomposition under label shift
  • Theorem 4.2: Upper bound of Bias Correction
  • Lemma 2.1
  • proof
  • proof