Table of Contents
Fetching ...

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria

Tengyuan Liang

TL;DR

This paper analyzes how covariate distribution shifts, when modeled adversarially via Wasserstein perturbations, affect learning in an infinite-dimensional linear setting. It reveals a dichotomy: in regression, adversarial shifts steer the covariates exponentially toward an optimal experimental design, enabling rapid subsequent learning toward the Bayes predictor $f^\top_{Bayes}$; in classification, shifts drive covariates toward a hardest design at a subquadratic rate, trapping learning away from the Bayes predictor. The results are derived through a sequential game framework and a detailed analysis of Wasserstein gradient flows, supported by numerical illustrations. The findings have implications for robust design and experimental planning under covariate shifts, and point to interesting future directions for iterative dynamics and nonlinear models.

Abstract

Covariate distribution shifts and adversarial perturbations present robustness challenges to the conventional statistical learning framework: mild shifts in the test covariate distribution can significantly affect the performance of the statistical model learned based on the training distribution. The model performance typically deteriorates when extrapolation happens: namely, covariates shift to a region where the training distribution is scarce, and naturally, the learned model has little information. For robustness and regularization considerations, adversarial perturbation techniques are proposed as a remedy; however, careful study needs to be carried out about what extrapolation region adversarial covariate shift will focus on, given a learned model. This paper precisely characterizes the extrapolation region, examining both regression and classification in an infinite-dimensional setting. We study the implications of adversarial covariate shifts to subsequent learning of the equilibrium -- the Bayes optimal model -- in a sequential game framework. We exploit the dynamics of the adversarial learning game and reveal the curious effects of the covariate shift to equilibrium learning and experimental design. In particular, we establish two directional convergence results that exhibit distinctive phenomena: (1) a blessing in regression, the adversarial covariate shifts in an exponential rate to an optimal experimental design for rapid subsequent learning; (2) a curse in classification, the adversarial covariate shifts in a subquadratic rate to the hardest experimental design trapping subsequent learning.

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria

TL;DR

This paper analyzes how covariate distribution shifts, when modeled adversarially via Wasserstein perturbations, affect learning in an infinite-dimensional linear setting. It reveals a dichotomy: in regression, adversarial shifts steer the covariates exponentially toward an optimal experimental design, enabling rapid subsequent learning toward the Bayes predictor ; in classification, shifts drive covariates toward a hardest design at a subquadratic rate, trapping learning away from the Bayes predictor. The results are derived through a sequential game framework and a detailed analysis of Wasserstein gradient flows, supported by numerical illustrations. The findings have implications for robust design and experimental planning under covariate shifts, and point to interesting future directions for iterative dynamics and nonlinear models.

Abstract

Covariate distribution shifts and adversarial perturbations present robustness challenges to the conventional statistical learning framework: mild shifts in the test covariate distribution can significantly affect the performance of the statistical model learned based on the training distribution. The model performance typically deteriorates when extrapolation happens: namely, covariates shift to a region where the training distribution is scarce, and naturally, the learned model has little information. For robustness and regularization considerations, adversarial perturbation techniques are proposed as a remedy; however, careful study needs to be carried out about what extrapolation region adversarial covariate shift will focus on, given a learned model. This paper precisely characterizes the extrapolation region, examining both regression and classification in an infinite-dimensional setting. We study the implications of adversarial covariate shifts to subsequent learning of the equilibrium -- the Bayes optimal model -- in a sequential game framework. We exploit the dynamics of the adversarial learning game and reveal the curious effects of the covariate shift to equilibrium learning and experimental design. In particular, we establish two directional convergence results that exhibit distinctive phenomena: (1) a blessing in regression, the adversarial covariate shifts in an exponential rate to an optimal experimental design for rapid subsequent learning; (2) a curse in classification, the adversarial covariate shifts in a subquadratic rate to the hardest experimental design trapping subsequent learning.
Paper Structure (27 sections, 8 theorems, 87 equations, 2 figures)

This paper contains 27 sections, 8 theorems, 87 equations, 2 figures.

Key Result

Theorem 1

Consider the regression setting where $\ell(y', y) = (y'- y)^2$ and $\mathbf{y}|\mathbf{x} = x \sim \mathrm{Gaussian}(\langle x, \theta^\star\rangle, 1 )$. Let $x_0 \in \mathop{\mathrm{supp}}\nolimits(\mu^{(0)})$ that satisfies $\langle x_0, \theta^\star - \theta^{(0)} \rangle \neq 0$. Then the ind Moreover, the directional convergence is exponential in $T$, where $c = 2\log(1+2\gamma \| \theta^

Figures (2)

  • Figure 1: Regression setting, directional convergence. From left to right, top to bottom, we plot the directional information at timestamp $t=0, 5, 10, \ldots, 40$, once every $5$ iterations.
  • Figure 2: Classification setting, directional convergence. From left to right, top to bottom, we plot the directional information at timestamp $t=0, 25, 50, \ldots, 200$, once every $25$ iterations.

Theorems & Definitions (12)

  • Theorem 1: Regression: directional convergence
  • Remark 2
  • Remark 3
  • Theorem 4: Classification: directional convergence
  • Remark 5
  • Theorem 6: Regression: blessing to the learner
  • Theorem 7: Classification: curse to the learner
  • Remark 8
  • Lemma 9: Nonlinear recursions
  • Lemma 10
  • ...and 2 more