Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

Philip Amortila; Tongyi Cao; Akshay Krishnamurthy

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

Philip Amortila, Tongyi Cao, Akshay Krishnamurthy

TL;DR

This work addresses misspecification under adversarial covariate shift by showing that standard empirical risk minimization (ERM) amplifies misspecification through the density ratio, hindering robust performance. It introduces disagreement-based regression (DBR), a robust regression procedure that screens training regions via a disagreement filter and uses a minimax objective to avoid misspecification amplification, achieving $R_{ ext{test}}( ext{DBR}) = O( ablafty^2)$ with an optimal statistical rate and no dependence on $C_{ ext{∞}}$ in the asymptote. The authors extend DBR to offline and online reinforcement learning, obtaining new guarantees under $L_{ ablafty}$-misspecification and concentrability (offline) or coverability (online), and they demonstrate separations between concentration/coverage notions and Bellman-error-based structural parameters. These results provide a principled, distribution-shift-aware approach to function approximation in RL and reveal fundamental trade-offs between misspecification, coverage, and computational tractability. The work broadens the toolkit for robust learning under covariate shift and distribution shift, with practical implications for offline/online RL and beyond, while outlining directions for scalable algorithms and deeper theoretical characterizations of the identified separations.

Abstract

A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ. As distribution shift typically results in a degradation in performance, much attention has been devoted to algorithmic interventions that mitigate these detrimental effects. In this paper, we study the effect of distribution shift in the presence of model misspecification, specifically focusing on $L_{\infty}$-misspecified regression and adversarial covariate shift, where the regression target remains fixed while the covariate distribution changes arbitrarily. We show that empirical risk minimization, or standard least squares regression, can result in undesirable misspecification amplification where the error due to misspecification is amplified by the density ratio between the training and testing distributions. As our main result, we develop a new algorithm -- inspired by robust optimization techniques -- that avoids this undesirable behavior, resulting in no misspecification amplification while still obtaining optimal statistical rates. As applications, we use this regression procedure to obtain new guarantees in offline and online reinforcement learning with misspecification and establish new separations between previously studied structural conditions and notions of coverage.

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

TL;DR

with an optimal statistical rate and no dependence on

in the asymptote. The authors extend DBR to offline and online reinforcement learning, obtaining new guarantees under

-misspecification and concentrability (offline) or coverability (online), and they demonstrate separations between concentration/coverage notions and Bellman-error-based structural parameters. These results provide a principled, distribution-shift-aware approach to function approximation in RL and reveal fundamental trade-offs between misspecification, coverage, and computational tractability. The work broadens the toolkit for robust learning under covariate shift and distribution shift, with practical implications for offline/online RL and beyond, while outlining directions for scalable algorithms and deeper theoretical characterizations of the identified separations.

Abstract

-misspecified regression and adversarial covariate shift, where the regression target remains fixed while the covariate distribution changes arbitrarily. We show that empirical risk minimization, or standard least squares regression, can result in undesirable misspecification amplification where the error due to misspecification is amplified by the density ratio between the training and testing distributions. As our main result, we develop a new algorithm -- inspired by robust optimization techniques -- that avoids this undesirable behavior, resulting in no misspecification amplification while still obtaining optimal statistical rates. As applications, we use this regression procedure to obtain new guarantees in offline and online reinforcement learning with misspecification and establish new separations between previously studied structural conditions and notions of coverage.

Paper Structure (33 sections, 19 theorems, 79 equations, 1 figure)

This paper contains 33 sections, 19 theorems, 79 equations, 1 figure.

Introduction
Contributions
Misspecified regression under distribution shift
Misspecification amplification for empirical risk minimization
Other existing algorithms
Main result: Disagreement-based regression
Extensions
Proof of thm:main
Step 1: Non-negativity
Step 2: Uniform convergence
Step 3: Analysis of $\hat{f}^{(n)}_{\normalfont {\normalfont \textsf{DBR}}\xspace}$
Applications to online and offline reinforcement learning
Offline reinforcement learning
Setup and notation
Algorithm and guarantee
...and 18 more sections

Key Result

proposition 1

For any $\delta \in (0,1)$ with probability at least $1-\delta$, ERM satisfies

Figures (1)

Figure 1: The construction used to prove prop:erm_lb. $f_{\mathrm{bad}}$ and $\bar{f}$ have equal risk under $\mathcal{D}_{\mathrm{train}}$ but $f_{\mathrm{bad}}$ concentrates errors onto $\mathcal{D}_{\mathrm{test}}$.

Theorems & Definitions (29)

proposition 1: upper bound
proposition 2: lower bound
theorem 1: Main result for DBR
corollary 1: Covariate shift for
corollary 2: Well-specified case
lemma 1: Non-negativity
lemma 2: Concentration
theorem 2: for offline RL
theorem 3: for online RL
proposition 2: upper bound
...and 19 more

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

TL;DR

Abstract

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (29)