Table of Contents
Fetching ...

Transfer learning with affine model transformation

Shunya Minami, Kenji Fukumizu, Yoshihiro Hayashi, Ryo Yoshida

TL;DR

The paper develops affine model transfer, a principled transfer-learning class for regression under squared loss, with predictions of the target domain expressed as $f_t(\mathbf{x}) = g_1(\mathbf{f_s}) + g_2(\mathbf{f_s})\; g_3(\mathbf{x})$, separating cross-domain shift from domain-specific factors. It shows this affine coupling framework subsumes neural feature extractors and several HTL methods, casts estimation in RKHS with a kernel-based objective, and provides a block-relaxation algorithm for training. Theoretical results include a generalization bound that improves when the source-target relation is strong and an excess-risk bound tied to eigenvalue decays of Gram matrices, linking complexity to overlap between source features and inputs. Empirically, AffineTL improves predictive performance across robotics, NLP document evaluation, and materials-science tasks while avoiding negative transfer and yielding interpretable insights into cross-domain differences. The approach is model-agnostic, scalable with kernel methods or neural networks, and offers a practical framework for leveraging related domains with limited target data.

Abstract

Supervised transfer learning has received considerable attention due to its potential to boost the predictive power of machine learning in scenarios where data are scarce. Generally, a given set of source models and a dataset from a target domain are used to adapt the pre-trained models to a target domain by statistically learning domain shift and domain-specific factors. While such procedurally and intuitively plausible methods have achieved great success in a wide range of real-world applications, the lack of a theoretical basis hinders further methodological development. This paper presents a general class of transfer learning regression called affine model transfer, following the principle of expected-square loss minimization. It is shown that the affine model transfer broadly encompasses various existing methods, including the most common procedure based on neural feature extractors. Furthermore, the current paper clarifies theoretical properties of the affine model transfer such as generalization error and excess risk. Through several case studies, we demonstrate the practical benefits of modeling and estimating inter-domain commonality and domain-specific factors separately with the affine-type transfer models.

Transfer learning with affine model transformation

TL;DR

The paper develops affine model transfer, a principled transfer-learning class for regression under squared loss, with predictions of the target domain expressed as , separating cross-domain shift from domain-specific factors. It shows this affine coupling framework subsumes neural feature extractors and several HTL methods, casts estimation in RKHS with a kernel-based objective, and provides a block-relaxation algorithm for training. Theoretical results include a generalization bound that improves when the source-target relation is strong and an excess-risk bound tied to eigenvalue decays of Gram matrices, linking complexity to overlap between source features and inputs. Empirically, AffineTL improves predictive performance across robotics, NLP document evaluation, and materials-science tasks while avoiding negative transfer and yielding interpretable insights into cross-domain differences. The approach is model-agnostic, scalable with kernel methods or neural networks, and offers a practical framework for leveraging related domains with limited target data.

Abstract

Supervised transfer learning has received considerable attention due to its potential to boost the predictive power of machine learning in scenarios where data are scarce. Generally, a given set of source models and a dataset from a target domain are used to adapt the pre-trained models to a target domain by statistically learning domain shift and domain-specific factors. While such procedurally and intuitively plausible methods have achieved great success in a wide range of real-world applications, the lack of a theoretical basis hinders further methodological development. This paper presents a general class of transfer learning regression called affine model transfer, following the principle of expected-square loss minimization. It is shown that the affine model transfer broadly encompasses various existing methods, including the most common procedure based on neural feature extractors. Furthermore, the current paper clarifies theoretical properties of the affine model transfer such as generalization error and excess risk. Through several case studies, we demonstrate the practical benefits of modeling and estimating inter-domain commonality and domain-specific factors separately with the affine-type transfer models.
Paper Structure (51 sections, 6 theorems, 88 equations, 9 figures, 6 tables, 3 algorithms)

This paper contains 51 sections, 6 theorems, 88 equations, 9 figures, 6 tables, 3 algorithms.

Key Result

Theorem 2.4

Under Assumptions asmp:diff-asmp:consist, the transformation functions $\phi$ and $\psi$ satisfy the following two properties: where $g_1$ and $g_2$ are some functions.

Figures (9)

  • Figure 1: Architectures of (a) feature extraction, (b) HTL in kuzborskij2013stability, and (c) affine model transfer.
  • Figure S.1: Direct learning
  • Figure S.6: Decay rates of eigenvalues of $K_2$ (blue lines), $K_3$ (green lines) and $K_2 \circ K_3$ (red lines) for all combinations of the five different kernels. The vertical axis represents the decay rate, and the horizontal axis represents the overlap dimension $d$ in the space where $\bm{x}$ and $\bm{f_s}$ are distributed.
  • Figure S.7: Change of RMSE values between the affine transfer model and the ordinary feature extractor when using different levels of intermediate layers as the source features. The line plot shows the mean and 95% confidence interval. As a baseline, RMSE values for direct learning without transfer and fine-tuned neural networks are shown as dotted and dashed lines, respectively.
  • Figure S.8: MD-calculated (vertical axis) and experimental values (horizontal axis) of the specific heat capacity at constant pressure for various amorphous polymers.
  • ...and 4 more figures

Theorems & Definitions (16)

  • Theorem 2.4
  • Theorem 4.1
  • Theorem 4.4
  • Proposition A.1
  • proof
  • Example 1: Squared loss
  • Example 2: Absolute loss
  • Example 3: Exponential-squared loss
  • proof
  • proof : Proof of Theorem 4.1
  • ...and 6 more