Smoothness Adaptive Hypothesis Transfer Learning

Haotian Lin; Matthew Reimherr

Smoothness Adaptive Hypothesis Transfer Learning

Haotian Lin, Matthew Reimherr

TL;DR

The paper tackles the challenge of adapting to unknown smoothness in a two-phase transfer learning setting for nonparametric regression. It introduces Smoothness Adaptive Transfer Learning (SATL), which employs Gaussian kernels in both TL phases to adapt to the Sobolev smoothness of the target, source, and their offset. Theoretical results establish minimax lower bounds and show SATL achieves matching upper bounds up to logarithmic factors, with the excess risk decomposed into a source-term and an offset-term governed by an similarity metric $\xi(h,f_S)$. Empirical experiments corroborate the theory, demonstrating adaptive performance and superiority over non-transfer learning and finite-basis TL methods. Overall, SATL provides a principled, adaptive approach to transfer learning in infinite-dimensional settings with clear implications for how domain similarity and sample sizes influence transfer dynamics.

Abstract

Many existing two-phase kernel-based hypothesis transfer learning algorithms employ the same kernel regularization across phases and rely on the known smoothness of functions to obtain optimality. Therefore, they fail to adapt to the varying and unknown smoothness between the target/source and their offset in practice. In this paper, we address these problems by proposing Smoothness Adaptive Transfer Learning (SATL), a two-phase kernel ridge regression(KRR)-based algorithm. We first prove that employing the misspecified fixed bandwidth Gaussian kernel in target-only KRR learning can achieve minimax optimality and derive an adaptive procedure to the unknown Sobolev smoothness. Leveraging these results, SATL employs Gaussian kernels in both phases so that the estimators can adapt to the unknown smoothness of the target/source and their offset function. We derive the minimax lower bound of the learning problem in excess risk and show that SATL enjoys a matching upper bound up to a logarithmic factor. The minimax convergence rate sheds light on the factors influencing transfer dynamics and demonstrates the superiority of SATL compared to non-transfer learning settings. While our main objective is a theoretical analysis, we also conduct several experiments to confirm our results.

Smoothness Adaptive Hypothesis Transfer Learning

TL;DR

. Empirical experiments corroborate the theory, demonstrating adaptive performance and superiority over non-transfer learning and finite-basis TL methods. Overall, SATL provides a principled, adaptive approach to transfer learning in infinite-dimensional settings with clear implications for how domain similarity and sample sizes influence transfer dynamics.

Abstract

Paper Structure (39 sections, 15 theorems, 136 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 39 sections, 15 theorems, 136 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Main contributions.
Related Literature
Preliminaries
Problem Formulation.
Non-Transfer Scenario.
Transfer Learning Framework.
Model Assumptions.
Target-Only KRR with Gaussian kernels
Smoothness Adaptive Transfer Learning
Theoretical Analysis
Experiments
Experiments for Target-Only KRR
Experiments for Transfer Learning
Discussion
...and 24 more sections

Key Result

Proposition 1

For a symmetric and positive semi-definite kernel $K:\mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$, let $\mathcal{H}_{K}$ be the RKHS associated with $K$wendland2004scattered. The KRR estimator is and we call the kernel $K$ as the imposed kernel. Then the convergence rate of the generalization error of $\hat{f}_{T}$, $\mathcal{E}(\hat{f}_{T})$ is given as follows.

Figures (5)

Figure 1: Geometric illustration for how $\xi(h,\mathcal{S})$ will affect the OTL dynamic. The circle represents an RKHS ball centered around $f_{S}$ with radius $h$. Two sets of $f_{S}$ and $f_{T}$ (denoted by red and blue) possess the same offset with the same signal strength $h$ while the source models' signal strength are different, leading to different angle $\theta_{0}$ and $\theta_{1}$ between $f_{S}$ and $f_{T}$.
Figure 2: Error decay curves of target-only KRR based on Gaussian kernel, both axes are in log scale. The blue curves denote the average generalization errors over 100 trials. The dashed black lines denote the theoretical decay rates.
Figure 3: Generalization error under different $h$ and smoothness of $f_{\delta}$. Each curve denotes the average error over 100 trails and the shadow regions denote one standard error of the mean. The left figure contains results for fixed $n_{T}$ scenario while the right figure is for varying $n_{T}$ scenario.
Figure 4: Error decay curves of target-only KRR based on Gaussian kernel, both axes are in log scale. The curves with different colors correspond to different $C$ and denote the average logarithmic generalization errors over 100 trials. The dashed black lines denote the theoretical decay rates.
Figure 5: Generalized error for fixed $n_{T}$ scenario under different $h$ and smoothness of $f_{T}-f_{S}$. Each row represents SATL, FBE-based-TL with Fourier basis and B-spline respectively.

Theorems & Definitions (31)

Proposition 1: Target-only Learning
Remark 1
Theorem 1: Non-Adaptive Rate
Remark 2
Theorem 2: Adaptive Rate
Remark 3
Theorem 3: Optimality of SATL
Remark 4
Lemma 1
Theorem 4
...and 21 more

Smoothness Adaptive Hypothesis Transfer Learning

TL;DR

Abstract

Smoothness Adaptive Hypothesis Transfer Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (31)