Generalization in anti-causal learning
Niki Kilbertus, Giambattista Parascandolo, Bernhard Schölkopf
TL;DR
The paper argues that strong generalization in anti-causal learning is limited unless a known causal model is available to guide search and validation. It introduces constructive anti-causal learning, where exhaustively or heuristically searching the space of causes followed by validation via the causal mechanism enables reliable inference of causes from effects. The authors connect this paradigm to deep learning, adversarial robustness, encoder-decoder architectures, disentanglement, Bayesian inference, and real-world scientific modeling, providing theoretical and literature-based support. They advocate for incorporating causal models into supervised learning to shift from mere inference to search and validation, with potential for improved robustness and generalization across tasks.
Abstract
The ability to learn and act in novel situations is still a prerogative of animate intelligence, as current machine learning methods mostly fail when moving beyond the standard i.i.d. setting. What is the reason for this discrepancy? Most machine learning tasks are anti-causal, i.e., we infer causes (labels) from effects (observations). Typically, in supervised learning we build systems that try to directly invert causal mechanisms. Instead, in this paper we argue that strong generalization capabilities crucially hinge on searching and validating meaningful hypotheses, requiring access to a causal model. In such a framework, we want to find a cause that leads to the observed effect. Anti-causal models are used to drive this search, but a causal model is required for validation. We investigate the fundamental differences between causal and anti-causal tasks, discuss implications for topics ranging from adversarial attacks to disentangling factors of variation, and provide extensive evidence from the literature to substantiate our view. We advocate for incorporating causal models in supervised learning to shift the paradigm from inference only, to search and validation.
