Transfer Learning for Kernel-based Regression

Chao Wang; Caixing Wang; Xin He; Xingdong Feng

Transfer Learning for Kernel-based Regression

Chao Wang, Caixing Wang, Xin He, Xingdong Feng

TL;DR

This work addresses nonparametric transfer learning for kernel ridge regression under posterior drift, introducing two practical algorithms: ${\mathcal A}_h$-TKRR for known transferable sources and SA-TKRR for unknown sources. The authors establish minimax lower bounds and near-optimal upper bounds for the respective estimators, showing explicit bias-transfer and aggregation contributions within a reproducing kernel Hilbert space framework. The methods are validated through extensive simulations and real-data analyses, demonstrating gains from informative sources and robustness to negative sources. The results bridge practical effectiveness and theoretical guarantees, offering principled tools for kernel-based regression with multi-source data and potential domain shifts.

Abstract

In recent years, transfer learning has garnered significant attention. Its ability to leverage knowledge from related studies to improve generalization performance in a target study has made it highly appealing. This paper focuses on investigating the transfer learning problem within the context of nonparametric regression over a reproducing kernel Hilbert space. The aim is to bridge the gap between practical effectiveness and theoretical guarantees. We specifically consider two scenarios: one where the transferable sources are known and another where they are unknown. For the known transferable source case, we propose a two-step kernel-based estimator by solely using kernel ridge regression. For the unknown case, we develop a novel method based on an efficient aggregation algorithm, which can automatically detect and alleviate the effects of negative sources. This paper provides the statistical properties of the desired estimators and establishes the minimax rate. Through extensive numerical experiments on synthetic data and real examples, we validate our theoretical findings and demonstrate the effectiveness of our proposed method.

Transfer Learning for Kernel-based Regression

TL;DR

This work addresses nonparametric transfer learning for kernel ridge regression under posterior drift, introducing two practical algorithms:

-TKRR for known transferable sources and SA-TKRR for unknown sources. The authors establish minimax lower bounds and near-optimal upper bounds for the respective estimators, showing explicit bias-transfer and aggregation contributions within a reproducing kernel Hilbert space framework. The methods are validated through extensive simulations and real-data analyses, demonstrating gains from informative sources and robustness to negative sources. The results bridge practical effectiveness and theoretical guarantees, offering principled tools for kernel-based regression with multi-source data and potential domain shifts.

Abstract

Paper Structure (21 sections, 5 theorems, 22 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 5 theorems, 22 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Our contributions
Related works
Paper organization
Preliminaries
Notation
Reproducing kernel
Multi-source, target model and similarity measure
Transfer Learning for Kernel Ridge Regression
Two-step transfer learning with known transferable sources
Transfer learning with unknown transferable sources
Theoretical Guarantees
Optimal convergence rates for ${\cal A}_h$-TKRR
Optimal convergence rates for SA-TKRR
Simulated Experiments
...and 6 more sections

Key Result

Theorem 4.4

Suppose that Assumptions assu1- assu3 are satisfied and ${\cal A}_h$ is known with given $h$, with the choices of $\lambda_1\asymp(n_{{\cal A}_h}+n_0)^{-\frac{1}{2r+\alpha}}$ and $\lambda_2\asymp h^{-\frac{2}{1+\alpha}}n_0^{-\frac{1}{1+\alpha}}$, for any $\delta \in (0,1)$ satisfying it holds with probability at least $1-\delta$ that where $C_0, C_1$ and $C_2$ are some universal constants.

Figures (5)

Figure 1: Different learning processes between classical machine learning and transfer learning.
Figure 2: Illustration of how Algorithm \ref{['Al1']} leads to a better estimator of $f_{\rho}^{(0)}$ in a theoretical example where ${\cal A}_h=\{1,2,3\}$, and $f_{\rho}^{(0)}$,$f_{\rho}^{(1)}$,$f_{\rho}^{(2)}$,$f_{\rho}^{(3)}$ are the corresponding target and source functions.
Figure 3: Averaged prediction errors of KRR, $\mathcal{A}_h$-TKRR and $\mathcal{A}_h$-TKRR-WD in Examples 1–3 under various scenarios with varying $|\mathcal{A}_h|$.
Figure 4: Averaged prediction errors of all the competitors in modified Examples 2 and 3 under various scenarios with varying $m$.
Figure 5: Averaged prediction errors of all the competitors under various scenarios in the wine quality data.

Theorems & Definitions (7)

Remark 1
Theorem 4.4: Upper bound of ${\cal A}_h$-TKRR
Remark 2
Theorem 4.5: Minimax lower bound
Theorem 4.7: Consistency of $\widehat{{\cal A}}_{e_\ell}$
Corollary 4.8: Consistency of $\widehat{{\cal A}}_{\ell}$
Theorem 4.9: Upper bound of SA-TKRR

Transfer Learning for Kernel-based Regression

TL;DR

Abstract

Transfer Learning for Kernel-based Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)