Transfer learning with affine model transformation
Shunya Minami, Kenji Fukumizu, Yoshihiro Hayashi, Ryo Yoshida
TL;DR
The paper develops affine model transfer, a principled transfer-learning class for regression under squared loss, with predictions of the target domain expressed as $f_t(\mathbf{x}) = g_1(\mathbf{f_s}) + g_2(\mathbf{f_s})\; g_3(\mathbf{x})$, separating cross-domain shift from domain-specific factors. It shows this affine coupling framework subsumes neural feature extractors and several HTL methods, casts estimation in RKHS with a kernel-based objective, and provides a block-relaxation algorithm for training. Theoretical results include a generalization bound that improves when the source-target relation is strong and an excess-risk bound tied to eigenvalue decays of Gram matrices, linking complexity to overlap between source features and inputs. Empirically, AffineTL improves predictive performance across robotics, NLP document evaluation, and materials-science tasks while avoiding negative transfer and yielding interpretable insights into cross-domain differences. The approach is model-agnostic, scalable with kernel methods or neural networks, and offers a practical framework for leveraging related domains with limited target data.
Abstract
Supervised transfer learning has received considerable attention due to its potential to boost the predictive power of machine learning in scenarios where data are scarce. Generally, a given set of source models and a dataset from a target domain are used to adapt the pre-trained models to a target domain by statistically learning domain shift and domain-specific factors. While such procedurally and intuitively plausible methods have achieved great success in a wide range of real-world applications, the lack of a theoretical basis hinders further methodological development. This paper presents a general class of transfer learning regression called affine model transfer, following the principle of expected-square loss minimization. It is shown that the affine model transfer broadly encompasses various existing methods, including the most common procedure based on neural feature extractors. Furthermore, the current paper clarifies theoretical properties of the affine model transfer such as generalization error and excess risk. Through several case studies, we demonstrate the practical benefits of modeling and estimating inter-domain commonality and domain-specific factors separately with the affine-type transfer models.
