Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Nathan Weill; Kaizheng Wang

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Nathan Weill, Kaizheng Wang

Abstract

We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Abstract

Paper Structure (60 sections, 18 theorems, 218 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 60 sections, 18 theorems, 218 equations, 4 figures, 3 tables, 2 algorithms.

Introduction
Contributions
Related Work
Notations
Problem Setup
Generalized linear models under covariate shift
Ridge regularization for kernel GLMs
Methodology
Soft vs. Hard Labeling.
Implementation and Tuning Strategy.
Theoretical Guarantees on the Adaptivity
Assumptions
Main results
Proof Sketch
Generic model selection analysis with GLM loss
...and 45 more sections

Key Result

Theorem 4.1

Let setupkernel assumptionssubexponentialassumptionconvexityassumptiontrueparam hold. Define for $\mu^2=M_1^6\sigma^2$ Choose any $\delta \in (0, 1/e]$. Consider Algorithm alg:pseudo-krr with $n_1=n/2$, $\tilde{\lambda} =\mu^2\log^7(n)\log(n_0/\delta)/n$, and Then there exists a function $\zeta$ polylogarithmic in $(n, n_0, \delta^{-1})$ multiplied by a polynomial in $\frac{K}{k}$, such that with

Figures (4)

Figure 1: Comparisons of three approaches on a log-log plot. $x$-axis: $n$. $y$-axis: excess risk. Red crosses: pseudo-labeling. Cyan triangles: the oracle method. Blue circles: the naive method.
Figure 2: Baseline logistic regression performance on ID and OOD test data, when trained only on ID data, as a function of the ridge regularization parameter $\lambda$
Figure 3: $B=n^{0.45}$
Figure 4: Selection risk curves of the three methods.

Theorems & Definitions (46)

Example 2.1: Linear and affine kernels
Example 2.2: Polynomial kernels
Example 2.3: First-order Sobolev kernel
Remark 4.1
Remark 4.2
Theorem 4.1: Excess risk
Remark 4.3: Interpretation of Effective Sample Size
Corollary 4.1: Convergence Rate with Effective Sample Size
Remark 4.4: Optimality and adaptivity
Theorem 5.1: Oracle inequality
...and 36 more

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Abstract

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (46)