Testable Learning with Distribution Shift

Adam R. Klivans; Konstantinos Stavropoulos; Arsen Vasilyan

Testable Learning with Distribution Shift

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

TL;DR

This work introduces Testable Learning with Distribution Shift (TDS learning), a framework that allows rejection when training and test marginals fail a computable test but guarantees low test error on the test distribution when accepted. It develops efficient TDS learners for key high-dimensional classes, including homogeneous and general halfspaces, intersections of halfspaces, decision trees, and Boolean formulas, under Gaussian or uniform marginals, via two main techniques: moment matching with $L_2$-sandwiching polynomials and disagreement-region methods. A central contribution is a transfer-lemma-based theory showing that $L_2$-sandwiching suffices to translate training-marginal structure into test performance, with concrete runtimes, and a lower-bound set of results demonstrating separations from PAC and agnostic learning. The results leverage constructions from pseudorandomness and testable-learning literature to obtain approximators and testers, enabling efficient, provable performance certificates under distribution shift with modest assumptions on the marginals. Overall, the paper advances a principled, computable approach to learning under distribution shift, balancing certification tests with provable guarantees on test error and offering concrete, scalable algorithms for several fundamental concept classes.

Abstract

We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on $\{\pm 1\}^d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.

Testable Learning with Distribution Shift

TL;DR

-sandwiching polynomials and disagreement-region methods. A central contribution is a transfer-lemma-based theory showing that

-sandwiching suffices to translate training-marginal structure into test performance, with concrete runtimes, and a lower-bound set of results demonstrating separations from PAC and agnostic learning. The results leverage constructions from pseudorandomness and testable-learning literature to obtain approximators and testers, enabling efficient, provable performance certificates under distribution shift with modest assumptions on the marginals. Overall, the paper advances a principled, computable approach to learning under distribution shift, balancing certification tests with provable guarantees on test error and offering concrete, scalable algorithms for several fundamental concept classes.

Abstract

We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution

, unlabeled samples from test distribution

and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between

and

. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from

and

pass an associated test; moreover, the test must accept if the marginal of

equals the marginal of

. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of

is Gaussian or uniform on

. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on

. For halfspaces in the realizable case (where there exists a halfspace consistent with both

and

), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree

-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.

Paper Structure (61 sections, 42 theorems, 139 equations, 1 table, 6 algorithms)

This paper contains 61 sections, 42 theorems, 139 equations, 1 table, 6 algorithms.

Introduction
Our Results
Learning Setup.
TDS Learning: the Agnostic Setting.
Results.
Techniques
Moment Matching/Sandwiching Polynomials.
Beyond Moment Matching.
Techniques from Testable Learning.
Related Work
Domain Adaptation.
PQ Learning.
Testable Learning.
Technical Overview
TDS Learning of Homogeneous Halfspaces
...and 46 more sections

Key Result

Proposition 1.1

No TDS learning algorithm can have an error guarantee better than $\Omega(\lambda)+\epsilon$.

Theorems & Definitions (77)

Proposition 1.1: Informal
Theorem 2.1: Agnostic TDS learning of Halfspaces
Proposition 2.2: Testably Bounding Halfspace Disagreement, Lemma 3.1 in gollakota2023tester
Remark 2.3
Remark 2.4
Definition 2.5: Disagreement Region
Theorem 2.6: Disagreement-Based TDS learning
Theorem 2.7: TDS learning of General Halfspaces
Lemma 2.8: Informal, Transfer Lemma for Square Loss, see \ref{['lemma:transfer-lemma-formal']}
Theorem 2.9: $\mathcal{L}_2$-sandwiching implies TDS Learning
...and 67 more

Testable Learning with Distribution Shift

TL;DR

Abstract

Testable Learning with Distribution Shift

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (77)