Towards Understanding Extrapolation: a Causal Lens

Lingjing Kong; Guangyi Chen; Petar Stojanov; Haoxuan Li; Eric P. Xing; Kun Zhang

Towards Understanding Extrapolation: a Causal Lens

Lingjing Kong, Guangyi Chen, Petar Stojanov, Haoxuan Li, Eric P. Xing, Kun Zhang

TL;DR

This work addresses extrapolation under distribution shifts when only a few target samples lie outside the training support. It introduces a causal latent-variable model with $x = g(z)$ and $z=[\mathbf{c},\mathbf{s}]$, where the invariant latent variable $\mathbf{c}$ governs the label and the changing variable $\mathbf{s}$ captures non-semantic shifts, enabling extrapolation by identifying $\mathbf{c}$ despite off-support $\mathbf{s}$. The authors develop identifiability guarantees under two regimes—dense and sparse shifts—providing concrete conditions on the generating function, manifold separability, and off-support distance, and show how these insights translate into practical algorithms (generative adaptation and regularization) for test-time adaptation. They validate the theory with synthetic and real-world experiments, demonstrating improved extrapolation performance and informing improvements to MAE-TTT and TeSLA-based methods via entropy minimization and sparsity constraints. Overall, the paper bridges causal representation learning with extrapolation, offering principled guarantees and actionable strategies for robust transfer under limited target information.

Abstract

Canonical work handling distribution shifts typically necessitates an entire target distribution that lands inside the training distribution. However, practical scenarios often involve only a handful of target samples, potentially lying outside the training support, which requires the capability of extrapolation. In this work, we aim to provide a theoretical understanding of when extrapolation is possible and offer principled methods to achieve it without requiring an on-support target distribution. To this end, we formulate the extrapolation problem with a latent-variable model that embodies the minimal change principle in causal mechanisms. Under this formulation, we cast the extrapolation problem into a latent-variable identification problem. We provide realistic conditions on shift properties and the estimation objectives that lead to identification even when only one off-support target sample is available, tackling the most challenging scenarios. Our theory reveals the intricate interplay between the underlying manifold's smoothness and the shift properties. We showcase how our theoretical results inform the design of practical adaptation algorithms. Through experiments on both synthetic and real-world data, we validate our theoretical findings and their practical implications.

Towards Understanding Extrapolation: a Causal Lens

TL;DR

This work addresses extrapolation under distribution shifts when only a few target samples lie outside the training support. It introduces a causal latent-variable model with

and

, where the invariant latent variable

governs the label and the changing variable

captures non-semantic shifts, enabling extrapolation by identifying

despite off-support

. The authors develop identifiability guarantees under two regimes—dense and sparse shifts—providing concrete conditions on the generating function, manifold separability, and off-support distance, and show how these insights translate into practical algorithms (generative adaptation and regularization) for test-time adaptation. They validate the theory with synthetic and real-world experiments, demonstrating improved extrapolation performance and informing improvements to MAE-TTT and TeSLA-based methods via entropy minimization and sparsity constraints. Overall, the paper bridges causal representation learning with extrapolation, offering principled guarantees and actionable strategies for robust transfer under limited target information.

Abstract

Paper Structure (57 sections, 6 theorems, 16 equations, 3 figures, 8 tables)

This paper contains 57 sections, 6 theorems, 16 equations, 3 figures, 8 tables.

Introduction
Related Work
Extrapolation.
Latent-variable identification for transfer learning.
Extrapolation and Latent-Variable Identification
Extrapolation and identifiability.
Identification Guarantees for Extrapolation
Notations.
Dense-shift Conditions
Understanding the problem.
Our approach.
Additional notations.
Discussion on the conditions.
Proof sketch.
Sparse-shift Conditions
...and 42 more sections

Key Result

Theorem 4.1

Assuming a generating process in Equation eq:data_generating, we estimate the distribution with model $(\hat{g}, \hat{p}(\hat{\mathbf{c}}), \hat{p}(\hat{\mathbf{s}}))$ with the objective: Under Assumption asmp:discrete_identification, the estimated model can attain the identifiability in Definition def:identifiability.

Figures (3)

Figure 1: Illustration of extrapolation and our theoretical conditions. The horizontal axis represents the changing variable $\mathbf{s}$, ranging from the source support to out-of-support regions. The vertical axis represents the observed data $\mathbf{x}$ living on the manifolds indexed by different values of the invariant variable $\mathbf{c}$. Figure (a) demonstrates that given a point out of support it is unclear which class manifolds it belongs to. Figure (b) illustrates the dense shift condition (Theorem \ref{['thm:extrapolation']}) where $\mathbf{s}$ potentially changes all pixels in the images, such as the camera angle in the example. In this case, we can identify the invariant variable $\mathbf{c}$ under a moderate amount of shift until the shift becomes excessive. For instance, the back view of the cat in the figure could be confused with other animals. Figure (c) illustrates the sparse shift condition (Theorem \ref{['thm:local_extrapolation']}) where $\mathbf{s}$ influences a limited number of pixels, such as the background in the example. In contrast to the dense shift, we can identify $\mathbf{c}$ under the sparse shift regardless of its severity. In the figure, there is no ambiguity of the class "cow" even though the background has changed to the moon.
Figure 2: The data-generating process. The invariant latent variable $c$ and the changing latent variable $s$ jointly generate the observed variable $x$. The dashed line indicates potential statistical dependence.
Figure 3: TTA classification errors under different levels of shift severity levels and scopes.

Theorems & Definitions (9)

Definition 3.1: Identifiability of the Invariant Variable $\mathbf{c}$
Theorem 4.1: Extrapolation under Dense Shifts
Theorem 4.2: Extrapolation under Sparse Shifts
Lemma A1: Source discrete subspace identification kong2024learning
Theorem A1: Extrapolation under Dense Shifts
proof : Proof for Theorem \ref{['thm:extrapolation']}
Theorem A1: Extrapolation under Sparse Shifts
Lemma A1: brady2023provably
proof

Towards Understanding Extrapolation: a Causal Lens

TL;DR

Abstract

Towards Understanding Extrapolation: a Causal Lens

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)