Causally Inspired Regularization Enables Domain General Representations
Olawale Salaudeen, Sanmi Koyejo
TL;DR
This work tackles domain generalization under distribution shifts by leveraging causal graphs to identify domain-general representations that remain stable across domains. It introduces Total Information Criterion (TIC) and a two-branch representation with a Hilbert Schmidt independence penalty (TCRI) to separate domain-general and domain-specific information without needing direct observations of spurious features. The method demonstrates superior average and worst-domain transfer performance on semi-synthetic and real-world benchmarks (ColoredMNIST, Spurrious PACS, Terra Incognita) compared with IRM, GroupDRO, VREx, and IB-based baselines, including ablations that highlight the importance of TIC. By enabling robust domain-general predictors in settings with complex spurious correlations, the approach has potential implications for safety-critical and fairness-sensitive applications where transfer across unseen domains is essential.
Abstract
Given a causal graph representing the data-generating process shared across different domains/distributions, enforcing sufficient graph-implied conditional independencies can identify domain-general (non-spurious) feature representations. For the standard input-output predictive setting, we categorize the set of graphs considered in the literature into two distinct groups: (i) those in which the empirical risk minimizer across training domains gives domain-general representations and (ii) those where it does not. For the latter case (ii), we propose a novel framework with regularizations, which we demonstrate are sufficient for identifying domain-general feature representations without a priori knowledge (or proxies) of the spurious features. Empirically, our proposed method is effective for both (semi) synthetic and real-world data, outperforming other state-of-the-art methods in average and worst-domain transfer accuracy.
