Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning
Philip Amortila, Tongyi Cao, Akshay Krishnamurthy
TL;DR
This work addresses misspecification under adversarial covariate shift by showing that standard empirical risk minimization (ERM) amplifies misspecification through the density ratio, hindering robust performance. It introduces disagreement-based regression (DBR), a robust regression procedure that screens training regions via a disagreement filter and uses a minimax objective to avoid misspecification amplification, achieving $R_{ ext{test}}( ext{DBR}) = O( ablafty^2)$ with an optimal statistical rate and no dependence on $C_{ ext{∞}}$ in the asymptote. The authors extend DBR to offline and online reinforcement learning, obtaining new guarantees under $L_{ ablafty}$-misspecification and concentrability (offline) or coverability (online), and they demonstrate separations between concentration/coverage notions and Bellman-error-based structural parameters. These results provide a principled, distribution-shift-aware approach to function approximation in RL and reveal fundamental trade-offs between misspecification, coverage, and computational tractability. The work broadens the toolkit for robust learning under covariate shift and distribution shift, with practical implications for offline/online RL and beyond, while outlining directions for scalable algorithms and deeper theoretical characterizations of the identified separations.
Abstract
A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ. As distribution shift typically results in a degradation in performance, much attention has been devoted to algorithmic interventions that mitigate these detrimental effects. In this paper, we study the effect of distribution shift in the presence of model misspecification, specifically focusing on $L_{\infty}$-misspecified regression and adversarial covariate shift, where the regression target remains fixed while the covariate distribution changes arbitrarily. We show that empirical risk minimization, or standard least squares regression, can result in undesirable misspecification amplification where the error due to misspecification is amplified by the density ratio between the training and testing distributions. As our main result, we develop a new algorithm -- inspired by robust optimization techniques -- that avoids this undesirable behavior, resulting in no misspecification amplification while still obtaining optimal statistical rates. As applications, we use this regression procedure to obtain new guarantees in offline and online reinforcement learning with misspecification and establish new separations between previously studied structural conditions and notions of coverage.
