Table of Contents
Fetching ...

Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees

Gautam Chandrasekaran, Adam R. Klivans, Lin Lin Lee, Konstantinos Stavropoulos

TL;DR

This work advances learning neural networks under distribution shift by formalizing a regression-appropriate Testable Learning with Distribution Shift (TDS) framework and delivering provably efficient algorithms that operate without assumptions on the test distribution. It fuses kernel methods with data-dependent feature maps and a spectral verification step to certify transfer from the training to the test distribution, enabling nonconvex regression with Lipschitz activations. For bounded, hypercontractive training marginals, it achieves fully polynomial-time learning for one-hidden-layer sigmoid nets and extends to deeper Lipschitz nets; for unbounded marginals with strictly subexponential tails, it uses uniform polynomial approximations and moment-based tests to obtain comparable guarantees. Overall, the paper provides practical, theory-backed algorithms for reliable learning under distribution shift, with explicit runtime and sample-complexity bounds tied to kernel representations and approximation theory.

Abstract

We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.

Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees

TL;DR

This work advances learning neural networks under distribution shift by formalizing a regression-appropriate Testable Learning with Distribution Shift (TDS) framework and delivering provably efficient algorithms that operate without assumptions on the test distribution. It fuses kernel methods with data-dependent feature maps and a spectral verification step to certify transfer from the training to the test distribution, enabling nonconvex regression with Lipschitz activations. For bounded, hypercontractive training marginals, it achieves fully polynomial-time learning for one-hidden-layer sigmoid nets and extends to deeper Lipschitz nets; for unbounded marginals with strictly subexponential tails, it uses uniform polynomial approximations and moment-based tests to obtain comparable guarantees. Overall, the paper provides practical, theory-backed algorithms for reliable learning under distribution shift, with explicit runtime and sample-complexity bounds tied to kernel representations and approximation theory.

Abstract

We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.

Paper Structure

This paper contains 23 sections, 31 theorems, 77 equations, 2 tables, 2 algorithms.

Key Result

Theorem 3.6

Under assumption:bounded, algorithm:tds-via-kernels learns the class ${\mathcal{F}}$ in the TDS regression setting up to excess error $5\epsilon$ and probability of failure $\delta$. The time complexity is $O(T) \cdot \mathrm{poly}(d,\frac{1}{{\epsilon}}, (\log(1/\delta))^\ell, A, B, C^\ell, 2^\ell,

Theorems & Definitions (70)

  • Definition 1.1: Testable Regression with Distribution Shift
  • Definition 3.1: Kernels Mercer1909FunctionsOP
  • Definition 3.3: Approximate Representation
  • Definition 3.4: Hypercontractivity
  • Theorem 3.6: TDS Learning via the Kernel Method
  • Proposition 3.7: Representer Theorem, modification of Theorem 6.11 in mohri2018foundations
  • proof
  • Lemma 3.8: Multiplicative Spectral Concentration, Lemma B.1 in goel2024tolerant, modified
  • proof : Proof of \ref{['theorem:tds-via-kernels']}
  • Definition 3.9: Hypercontractivity
  • ...and 60 more