Transfer Learning Beyond Bounded Density Ratios

Alkis Kalavasis; Ilias Zadik; Manolis Zampetakis

Transfer Learning Beyond Bounded Density Ratios

Alkis Kalavasis, Ilias Zadik, Manolis Zampetakis

TL;DR

A general transfer inequality over the domain $\mathbb{R}^n$ is proved, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded.

Abstract

We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio $dQ/dP$ is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain $\mathbb{R}^n$, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded. For instance, it always applies if $Q$ is a log-concave measure and the inverse ratio $dP/dQ$ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where $dQ/dP$ equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube $\{-1,1\}^n$, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator $\widehat{f}-f^*$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^*)$, acts as a sufficient condition for transferability; when $\mathrm{I}_{\max}(\widehat{f}-f^*)$ is appropriately bounded, transfer is possible over the Boolean domain.

Transfer Learning Beyond Bounded Density Ratios

TL;DR

A general transfer inequality over the domain

is proved, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that

is bounded.

Abstract

We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution

but needs to perform well with respect to a different target distribution

. A standard change of measure argument implies that transfer learning happens when the density ratio

is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio

is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain

, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that

is bounded. For instance, it always applies if

is a log-concave measure and the inverse ratio

is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where

equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube

, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator

under

, acts as a sufficient condition for transferability; when

is appropriately bounded, transfer is possible over the Boolean domain.

Paper Structure (37 sections, 12 theorems, 98 equations, 2 figures)

This paper contains 37 sections, 12 theorems, 98 equations, 2 figures.

Introduction
Summary of Contributions
Empirical Motivation: Polynomial Regression vs. Deep ReLU Networks
Related Work
Overview
Transfer Learning for Low-Degree Polynomials in the Euclidean Domain
Notation & Definitions
Main Inequality
Transfer Learning for Low-Degree Polynomials in the Boolean Domain
Quick Boolean Analysis Background
Main Inequality
Applications in the Euclidean Domain
Truncated Statistics
General Transfer Learning for Truncated Gaussians
General Transfer Learning for Truncated Regression
...and 22 more sections

Key Result

Theorem 2.3

Let $\mathcal{L}$ be the space of log-concave probability distributions over $\mathbb R^n$. Consider two probability distributions $P,Q$ over $\mathbb R^n$ and let $f : \mathbb R^n \to \mathbb R$ be a degree-$d$ polynomial. There exists an absolute constant $C$ such that for $\alpha,\beta \in [1,\in In particular, if $\alpha = \infty$, we get that

Figures (2)

Figure 1: Transferability of polynomial and neural network estimators with $P = \mathcal{U}([0,1]\times[-1,1])$ (inside the dotted rectangle in (a)). (a) contour plot of $f^\star(x,y) = \sin(2\pi x)\sin(2\pi y)$, (b) contour plot of a degree-20 polynomial regressor $f_1$, (c) contour plot of a 6-layer size-110 ReLU network $f_2$ (see also \ref{['sec:exp']}).
Figure 2: Transfer Learning with source distribution $P = \mathcal{U}([-1/2,1/2]^2)$ (see dotted box of top left plot) and the target $Q = \mathcal{U}([-5,5]^2)$ with true function $f^\star(x,y) = \sin(2\pi x)\sin(2\pi y) + xy$ (top left) using as a regressor $f$ (i) a degree-20 polynomial (top right), (ii) a 6-layer size-110 ReLU neural network (bottom left) and, (iii) a 6-layer size-110 polynomial neural network (bottom right).

Theorems & Definitions (32)

Definition 2.1
Definition 2.2
Theorem 2.3: Transferability of Polynomials
Corollary 2.4
Proposition 3.1: Invariance Principle mossel2005noise
Theorem 3.2: Transferability of Boolean functions
Remark 4.1
Definition 4.2
Remark 4.3
Corollary 4.4: Transfer in Truncated Regression
...and 22 more

Transfer Learning Beyond Bounded Density Ratios

TL;DR

Abstract

Transfer Learning Beyond Bounded Density Ratios

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (32)