Table of Contents
Fetching ...

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Nathan Weill, Kaizheng Wang

Abstract

We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Abstract

We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.
Paper Structure (60 sections, 18 theorems, 218 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 60 sections, 18 theorems, 218 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 4.1

Let setupkernel assumptionssubexponentialassumptionconvexityassumptiontrueparam hold. Define for $\mu^2=M_1^6\sigma^2$ Choose any $\delta \in (0, 1/e]$. Consider Algorithm alg:pseudo-krr with $n_1=n/2$, $\tilde{\lambda} =\mu^2\log^7(n)\log(n_0/\delta)/n$, and Then there exists a function $\zeta$ polylogarithmic in $(n, n_0, \delta^{-1})$ multiplied by a polynomial in $\frac{K}{k}$, such that with

Figures (4)

  • Figure 1: Comparisons of three approaches on a log-log plot. $x$-axis: $n$. $y$-axis: excess risk. Red crosses: pseudo-labeling. Cyan triangles: the oracle method. Blue circles: the naive method.
  • Figure 2: Baseline logistic regression performance on ID and OOD test data, when trained only on ID data, as a function of the ridge regularization parameter $\lambda$
  • Figure 3: $B=n^{0.45}$
  • Figure 4: Selection risk curves of the three methods.

Theorems & Definitions (46)

  • Example 2.1: Linear and affine kernels
  • Example 2.2: Polynomial kernels
  • Example 2.3: First-order Sobolev kernel
  • Remark 4.1
  • Remark 4.2
  • Theorem 4.1: Excess risk
  • Remark 4.3: Interpretation of Effective Sample Size
  • Corollary 4.1: Convergence Rate with Effective Sample Size
  • Remark 4.4: Optimality and adaptivity
  • Theorem 5.1: Oracle inequality
  • ...and 36 more