Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Kulunu Dharmakeerthi; YoonHaeng Hur; Tengyuan Liang

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Kulunu Dharmakeerthi, YoonHaeng Hur, Tengyuan Liang

TL;DR

The necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction and predictability are studied and a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk is shown.

Abstract

Practitioners often deploy a learned prediction model in a new environment where the joint distribution of covariate and response has shifted. In observational data, the distribution shift is often driven by unobserved confounding factors lurking in the environment, with the underlying mechanism unknown. Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. This motivates us to study the domain adaptation problem with observational data: given labeled covariate and response pairs from a source environment, and unlabeled covariates from a target environment, how can one predict the missing target response reliably? We root the adaptation problem in a linear structural causal model to address endogeneity and unobserved confounding. We study the necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction. This further motivates a new representation learning method for adaptation that optimizes for a lower-dimensional linear subspace and, subsequently, a prediction model confined to that subspace. The procedure operates on a non-convex objective-that naturally interpolates between predictability and stability/invariance-constrained on the Stiefel manifold. We study the optimization landscape and prove that, when the regularization is sufficient, nearly all local optima align with an invariant linear subspace resilient to both concept and covariate shift. In terms of predictability, we show a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk. Three real-world data sets are investigated to validate our method and theory.

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

TL;DR

Abstract

Paper Structure (28 sections, 9 theorems, 76 equations, 5 figures, 1 algorithm)

This paper contains 28 sections, 9 theorems, 76 equations, 5 figures, 1 algorithm.

Introduction
A Structural Causal Model
Observational Data and the Adaptation Problem
Confounding and Concept Shift
Notations
Related Literature
Contributions
Undesired Properties of Source Risk Minimization
Concept Shift, Confounding, and Subspace Invariance
Target Risk Improvement via Invariance
Methodology: Domain Adaptation via Manifold Optimization
A Surrogate for the Target Risk
An Optimization Procedure on the Stiefel Manifold
Geometry on the Stiefel Manifold
Gradient Formula and Retraction
...and 13 more sections

Key Result

Proposition 1

Under Assumptions asmp:base and asmp:ortho, the risk minimization in eq:risk_minimization admits a unique minimizer, which we denote as $\beta_\mathcal{E}$ for $\mathcal{E} \in \{\mathcal{S}, \mathcal{T}\}$. If $\mathbb{E}_\mathcal{S}[E E^\top] \neq \mathbb{E}_\mathcal{T}[E E^\top]$, then namely, the best linear predictors (concept) shift across the two environments for some endogeneity parameter

Figures (5)

Figure 1: Diagram visualizing the model equations \ref{['eq:model_Y']} and \ref{['eq:model_X']}. Here, the endogenous confounding variable $E$ lurking in the environment and the exogenous invariant variable $Z$ are both latent and unobserved.
Figure 2: Left: Target Rich Regime with $\sigma^2_t = 10, \ \sigma^2_s = 2, \ \tau^2 = 10$; Right: Source Rich Regime with $\sigma^2_t = 2, \ \sigma^2_s = 10, \ \tau^2 = 10$. Here x-axis illustrates the scalar parameter $x \in \mathbb{R}$, and y-axis shows the risk improvement $R_\mathcal{T}(\beta^\Theta_\mathcal{S}) - R_\mathcal{T}(\beta_\mathcal{S})$.
Figure 3: Forest Fires Data: Performance of the linear subspace predictor $V \alpha$ obtained by solving \ref{['eqn:opt-stiefel']} for different values of $\upsilon$ and $\eta$, where $d = 7$ and $\ell =6$. (a) plots the risk on target dataset $R_\mathcal{T}(V \alpha)$, where the solid red horizontal line shows $R_\mathcal{T}(\beta_\mathcal{S})$ and the solid black line shows $R_\mathcal{T}(\beta_\mathcal{T})$. (b) shows the risk of on source dataset $R_\mathcal{S}(V \alpha)$. (c) plots the difference $R_\mathcal{T}(V \alpha) - R_\mathcal{S}(V \alpha)$.
Figure 4: Bike Sharing Data: Performance of the linear subspace predictor $V \alpha$ obtained by solving \ref{['eqn:opt-stiefel']} for different values of $\upsilon$ and $\eta$, where $d = 5$ and $\ell =4$. The subplots (a)-(c) plot the same quantities as in Figure \ref{['fig:forest-fires']}.
Figure 5: Wine Quality Data: Performance of the linear subspace predictor $V \alpha$ obtained by solving \ref{['eqn:opt-stiefel']} for different values of $\upsilon$ and $\eta$, where $d = 11$ and $\ell =7$. The subplots (a)-(c) plot the same quantities as in Figure \ref{['fig:forest-fires']}.

Theorems & Definitions (24)

Definition 1: Model
Proposition 1: Concept Shift
Remark
Proposition 2: Subspace Invariance
Remark
Proposition 3: Target Risk Improvement
Example
Proposition 4
Proposition 5: Surrogate for Target
Proposition 6: Riemannian Gradient and Retraction
...and 14 more

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

TL;DR

Abstract

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (24)