Minimax optimal adaptive structured transfer learning through semi-parametric domain-varying coefficient model

Hanxiao Chen; Debarghya Mukherjee

Minimax optimal adaptive structured transfer learning through semi-parametric domain-varying coefficient model

Hanxiao Chen, Debarghya Mukherjee

TL;DR

This work proposes a semiparametric domain-varying coefficient model (DVCM), in which domain-relatedness is encoded through an observable domain identifier, and develops an adaptive transfer learning estimator that selectively borrows strength from informative source domains while provably safeguarding against negative transfer.

Abstract

Transfer learning aims to improve inference in a target domain by leveraging information from related source domains, but its effectiveness critically depends on how cross-domain heterogeneity is modeled and controlled. When the conditional mechanism linking covariates and responses varies across domains, indiscriminate information pooling can lead to negative transfer, degrading performance relative to target-only estimation. We study a multi-source, single-target transfer learning problem under conditional distributional drift and propose a semiparametric domain-varying coefficient model (DVCM), in which domain-relatedness is encoded through an observable domain identifier. This framework generalizes classical varying-coefficient models to structured transfer learning and interpolates between invariant and fully heterogeneous regimes. Building on this model, we develop an adaptive transfer learning estimator that selectively borrows strength from informative source domains while provably safeguarding against negative transfer. Our estimator is computationally efficient and easy to implement; we also show that it is minimax rate-optimal and derive its asymptotic distribution, enabling valid uncertainty quantification and hypothesis testing despite data-adaptive pooling and shrinkage. Our results precisely characterize the interplay among domain heterogeneity, the smoothness of the underlying mean function, and the number of source domains and are corroborated by comprehensive numerical experiments and two real-data applications.

Minimax optimal adaptive structured transfer learning through semi-parametric domain-varying coefficient model

TL;DR

Abstract

Paper Structure (17 sections, 8 theorems, 59 equations, 8 figures, 2 algorithms)

This paper contains 17 sections, 8 theorems, 59 equations, 8 figures, 2 algorithms.

Introduction
Methodology
Transfer learning for linear DVCM
Transfer learning for generalized linear DVCM
Estimating $Q$
Theoretical Analysis
Non-Asymptotic Results for Linear DVCMs
Inference with linear DVCM
Extension to generalized DVCM
Simulation experiments
Bandwidth sensitivity analysis
Asymptotic normality
Phase transition in the rate of estimation
Real data analysis
Application 1: Survey of Labour and Income Dynamics in Ontario
...and 2 more sections

Key Result

Proposition 3.5

Let $A$ be any matrix satisfying $0 \preceq A \preceq C I$ for some constant $C>0$, and define $\mathrm{MSE}_A(\hat{{\boldsymbol\theta}})= \mathbb{E} \|\hat{{\boldsymbol\theta}} - {\boldsymbol\theta}\|_A^2$, where $\|x\|_A^2 = x^\top A x$. Under Assumption Assump:VCM-UX, the target-only estimator $\ Furthermore, under Assumptions Assump:VCM-SN--Assump:Unif-Kernel, the DVCM estimator $\hat{{\boldsy

Figures (8)

Figure 1: MSE of the estimators across different $h$ and $\gamma$, with $(n, n_0,K)$ fixed. The left, middle, and right panels show MSE of linear, logistic, and Poisson-based estimators while upper, middle, and lower panels are cases where $\gamma = 0.5, 1$, and $1.5$.
Figure 2: MSE of the estimators across different $h$ and $K$, with $(\bar{n}, n_0,\gamma)$ fixed. The left, middle, and right panels show MSE of linear, logistic, and Poisson-based estimators while upper, middle, and lower panels are cases where $K = 5, 10$, and $15$.
Figure 3: The histogram of normalized estimators $\check\theta_j(u_0) = \tfrac{\hat{\theta}_{\mathrm{TL},j}(u_0)-\theta_{j}(u_0)}{\hat{{\sf SE}}\left(\hat{\theta}_{\mathrm{TL},j}(u_0)\right)}$ when $\rho_n \to 0$ and $\infty$.
Figure 4: Log–log plot of MSE of $\hat{{\boldsymbol\theta}}_{\rm TL}$ as a function of $K$, while keeping $(\bar{n}_S, \gamma)$ fixed. Vertical dotted lines indicate empirical breakpoints that mark phase transitions in the convergence behavior. The left, middle, and right panels correspond to the linear, logistic, and Poisson models, respectively.
Figure 5: Log–log plot of MSE of $\hat{{\boldsymbol\theta}}_{\rm TL}$ as a function of $\gamma$, while keeping $(n, K)$ fixed. Vertical dotted lines indicate empirical breakpoints that mark phase transitions in the convergence behavior. The left, middle, and right panels correspond to the linear, logistic, and Poisson models, respectively.
...and 3 more figures

Theorems & Definitions (11)

Remark 2.1
Proposition 3.5
Theorem 3.6
Remark 3.7
Corollary 3.8
Theorem 3.9
Theorem 3.10
Corollary 3.11
Remark 3.12
Theorem 3.13
...and 1 more

Minimax optimal adaptive structured transfer learning through semi-parametric domain-varying coefficient model

TL;DR

Abstract

Minimax optimal adaptive structured transfer learning through semi-parametric domain-varying coefficient model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (11)