Positive Distribution Shift as a Framework for Understanding Tractable Learning
Marko Medvedev, Idan Attias, Elisabetta Cornacchia, Theodor Misiakiewicz, Gal Vardi, Nathan Srebro
TL;DR
This work reframes distribution shift not as a hindrance but as a lever for tractable learning by proposing Positive Distribution Shift (PDS) and DS-PAC/f-PDS frameworks. It demonstrates that carefully chosen training distributions can render computationally hard classes, such as parity and junta functions, efficiently learnable with standard gradient-based methods, while preserving the target test distribution. The paper connects PDS to membership-query models, showing that DS-PAC implies NA-MQ and, in turn, RDSPAC, thereby linking practical training-data strategies to classical query-based frameworks. Collectively, these results provide a theoretical foundation for dataset design as a core component of learnability, with implications for when and how SGD-based training on neural nets can succeed under covariate shift.
Abstract
We study a setting where the goal is to learn a target function f(x) with respect to a target distribution D(x), but training is done on i.i.d. samples from a different training distribution D'(x), labeled by the true target f(x). Such a distribution shift (here in the form of covariate shift) is usually viewed negatively, as hurting or making learning harder, and the traditional distribution shift literature is mostly concerned with limiting or avoiding this negative effect. In contrast, we argue that with a well-chosen D'(x), the shift can be positive and make learning easier -- a perspective called Positive Distribution Shift (PDS). Such a perspective is central to contemporary machine learning, where much of the innovation is in finding good training distributions D'(x), rather than changing the training algorithm. We further argue that the benefit is often computational rather than statistical, and that PDS allows computationally hard problems to become tractable even using standard gradient-based training. We formalize different variants of PDS, show how certain hard classes are easily learnable under PDS, and make connections with membership query learning.
