Table of Contents
Fetching ...

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Kulunu Dharmakeerthi, YoonHaeng Hur, Tengyuan Liang

TL;DR

The necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction and predictability are studied and a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk is shown.

Abstract

Practitioners often deploy a learned prediction model in a new environment where the joint distribution of covariate and response has shifted. In observational data, the distribution shift is often driven by unobserved confounding factors lurking in the environment, with the underlying mechanism unknown. Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. This motivates us to study the domain adaptation problem with observational data: given labeled covariate and response pairs from a source environment, and unlabeled covariates from a target environment, how can one predict the missing target response reliably? We root the adaptation problem in a linear structural causal model to address endogeneity and unobserved confounding. We study the necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction. This further motivates a new representation learning method for adaptation that optimizes for a lower-dimensional linear subspace and, subsequently, a prediction model confined to that subspace. The procedure operates on a non-convex objective-that naturally interpolates between predictability and stability/invariance-constrained on the Stiefel manifold. We study the optimization landscape and prove that, when the regularization is sufficient, nearly all local optima align with an invariant linear subspace resilient to both concept and covariate shift. In terms of predictability, we show a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk. Three real-world data sets are investigated to validate our method and theory.

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

TL;DR

The necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction and predictability are studied and a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk is shown.

Abstract

Practitioners often deploy a learned prediction model in a new environment where the joint distribution of covariate and response has shifted. In observational data, the distribution shift is often driven by unobserved confounding factors lurking in the environment, with the underlying mechanism unknown. Confounding can obfuscate the definition of the best prediction model (concept shift) and shift covariates to domains yet unseen (covariate shift). Therefore, a model maximizing prediction accuracy in the source environment could suffer a significant accuracy drop in the target environment. This motivates us to study the domain adaptation problem with observational data: given labeled covariate and response pairs from a source environment, and unlabeled covariates from a target environment, how can one predict the missing target response reliably? We root the adaptation problem in a linear structural causal model to address endogeneity and unobserved confounding. We study the necessity and benefit of leveraging exogenous, invariant covariate representations to cure concept shifts and improve target prediction. This further motivates a new representation learning method for adaptation that optimizes for a lower-dimensional linear subspace and, subsequently, a prediction model confined to that subspace. The procedure operates on a non-convex objective-that naturally interpolates between predictability and stability/invariance-constrained on the Stiefel manifold. We study the optimization landscape and prove that, when the regularization is sufficient, nearly all local optima align with an invariant linear subspace resilient to both concept and covariate shift. In terms of predictability, we show a model that uses the learned lower-dimensional subspace can incur a nearly ideal gap between target and source risk. Three real-world data sets are investigated to validate our method and theory.
Paper Structure (28 sections, 9 theorems, 76 equations, 5 figures, 1 algorithm)

This paper contains 28 sections, 9 theorems, 76 equations, 5 figures, 1 algorithm.

Key Result

Proposition 1

Under Assumptions asmp:base and asmp:ortho, the risk minimization in eq:risk_minimization admits a unique minimizer, which we denote as $\beta_\mathcal{E}$ for $\mathcal{E} \in \{\mathcal{S}, \mathcal{T}\}$. If $\mathbb{E}_\mathcal{S}[E E^\top] \neq \mathbb{E}_\mathcal{T}[E E^\top]$, then namely, the best linear predictors (concept) shift across the two environments for some endogeneity parameter

Figures (5)

  • Figure 1: Diagram visualizing the model equations \ref{['eq:model_Y']} and \ref{['eq:model_X']}. Here, the endogenous confounding variable $E$ lurking in the environment and the exogenous invariant variable $Z$ are both latent and unobserved.
  • Figure 2: Left: Target Rich Regime with $\sigma^2_t = 10, \ \sigma^2_s = 2, \ \tau^2 = 10$; Right: Source Rich Regime with $\sigma^2_t = 2, \ \sigma^2_s = 10, \ \tau^2 = 10$. Here x-axis illustrates the scalar parameter $x \in \mathbb{R}$, and y-axis shows the risk improvement $R_\mathcal{T}(\beta^\Theta_\mathcal{S}) - R_\mathcal{T}(\beta_\mathcal{S})$.
  • Figure 3: Forest Fires Data: Performance of the linear subspace predictor $V \alpha$ obtained by solving \ref{['eqn:opt-stiefel']} for different values of $\upsilon$ and $\eta$, where $d = 7$ and $\ell =6$. (a) plots the risk on target dataset $R_\mathcal{T}(V \alpha)$, where the solid red horizontal line shows $R_\mathcal{T}(\beta_\mathcal{S})$ and the solid black line shows $R_\mathcal{T}(\beta_\mathcal{T})$. (b) shows the risk of on source dataset $R_\mathcal{S}(V \alpha)$. (c) plots the difference $R_\mathcal{T}(V \alpha) - R_\mathcal{S}(V \alpha)$.
  • Figure 4: Bike Sharing Data: Performance of the linear subspace predictor $V \alpha$ obtained by solving \ref{['eqn:opt-stiefel']} for different values of $\upsilon$ and $\eta$, where $d = 5$ and $\ell =4$. The subplots (a)-(c) plot the same quantities as in Figure \ref{['fig:forest-fires']}.
  • Figure 5: Wine Quality Data: Performance of the linear subspace predictor $V \alpha$ obtained by solving \ref{['eqn:opt-stiefel']} for different values of $\upsilon$ and $\eta$, where $d = 11$ and $\ell =7$. The subplots (a)-(c) plot the same quantities as in Figure \ref{['fig:forest-fires']}.

Theorems & Definitions (24)

  • Definition 1: Model
  • Proposition 1: Concept Shift
  • Remark
  • Proposition 2: Subspace Invariance
  • Remark
  • Proposition 3: Target Risk Improvement
  • Example
  • Proposition 4
  • Proposition 5: Surrogate for Target
  • Proposition 6: Riemannian Gradient and Retraction
  • ...and 14 more