Table of Contents
Fetching ...

Generalization in anti-causal learning

Niki Kilbertus, Giambattista Parascandolo, Bernhard Schölkopf

TL;DR

The paper argues that strong generalization in anti-causal learning is limited unless a known causal model is available to guide search and validation. It introduces constructive anti-causal learning, where exhaustively or heuristically searching the space of causes followed by validation via the causal mechanism enables reliable inference of causes from effects. The authors connect this paradigm to deep learning, adversarial robustness, encoder-decoder architectures, disentanglement, Bayesian inference, and real-world scientific modeling, providing theoretical and literature-based support. They advocate for incorporating causal models into supervised learning to shift from mere inference to search and validation, with potential for improved robustness and generalization across tasks.

Abstract

The ability to learn and act in novel situations is still a prerogative of animate intelligence, as current machine learning methods mostly fail when moving beyond the standard i.i.d. setting. What is the reason for this discrepancy? Most machine learning tasks are anti-causal, i.e., we infer causes (labels) from effects (observations). Typically, in supervised learning we build systems that try to directly invert causal mechanisms. Instead, in this paper we argue that strong generalization capabilities crucially hinge on searching and validating meaningful hypotheses, requiring access to a causal model. In such a framework, we want to find a cause that leads to the observed effect. Anti-causal models are used to drive this search, but a causal model is required for validation. We investigate the fundamental differences between causal and anti-causal tasks, discuss implications for topics ranging from adversarial attacks to disentangling factors of variation, and provide extensive evidence from the literature to substantiate our view. We advocate for incorporating causal models in supervised learning to shift the paradigm from inference only, to search and validation.

Generalization in anti-causal learning

TL;DR

The paper argues that strong generalization in anti-causal learning is limited unless a known causal model is available to guide search and validation. It introduces constructive anti-causal learning, where exhaustively or heuristically searching the space of causes followed by validation via the causal mechanism enables reliable inference of causes from effects. The authors connect this paradigm to deep learning, adversarial robustness, encoder-decoder architectures, disentanglement, Bayesian inference, and real-world scientific modeling, providing theoretical and literature-based support. They advocate for incorporating causal models into supervised learning to shift from mere inference to search and validation, with potential for improved robustness and generalization across tasks.

Abstract

The ability to learn and act in novel situations is still a prerogative of animate intelligence, as current machine learning methods mostly fail when moving beyond the standard i.i.d. setting. What is the reason for this discrepancy? Most machine learning tasks are anti-causal, i.e., we infer causes (labels) from effects (observations). Typically, in supervised learning we build systems that try to directly invert causal mechanisms. Instead, in this paper we argue that strong generalization capabilities crucially hinge on searching and validating meaningful hypotheses, requiring access to a causal model. In such a framework, we want to find a cause that leads to the observed effect. Anti-causal models are used to drive this search, but a causal model is required for validation. We investigate the fundamental differences between causal and anti-causal tasks, discuss implications for topics ranging from adversarial attacks to disentangling factors of variation, and provide extensive evidence from the literature to substantiate our view. We advocate for incorporating causal models in supervised learning to shift the paradigm from inference only, to search and validation.

Paper Structure

This paper contains 15 sections, 2 theorems, 2 figures.

Key Result

Proposition 1

Assume query access to a deterministic causal model $f: \mathcal{X} \to \mathcal{Y}$ on a finite domain $\mathcal{X}$. Then in the anti-causal direction, for each $y \in \mathcal{Y}$ we can determine whether it has a valid cause, and if so, find a cause $x \in \mathcal{X}$ such that $y$ is the effec

Figures (2)

  • Figure 1: The low-level causal mechanism for MNIST digits and the high-level assigned labels.
  • Figure 2: A model for the anti-causal direction that implements the causal model and uses it to find the correct cause by a (heuristically guided) exhaustive search.

Theorems & Definitions (2)

  • Proposition 1: Constructive anti-causal learning
  • Corollary