Table of Contents
Fetching ...

Importance Weighting Correction of Regularized Least-Squares for Target Shift

Davit Gogolashvili

TL;DR

This work analyzes importance-weighted kernel ridge regression under target shift and shows that, because the weights depend only on the output variable, reweighting corrects the train-test mismatch without altering the input-space complexity that governs kernel generalization.

Abstract

Importance weighting is a standard tool for correcting distribution shift, but its statistical behavior under target shift -- where the label distribution changes between training and testing while the conditional distribution of inputs given the label remains stable -- remains under-explored. We analyze importance-weighted kernel ridge regression under target shift and show that, because the weights depend only on the output variable, reweighting corrects the train-test mismatch without altering the input-space complexity that governs kernel generalization. Under standard RKHS regularity and capacity conditions and a mild Bernstein-type moment condition on the label weights, we obtain finite-sample guarantees showing that the estimator achieves the same convergence behavior as in the no-shift case, with shift severity affecting only the constants through weight moments. We complement these results with matching minimax lower bounds, establishing rate optimality and quantifying the unavoidable dependence on shift severity. We further study more general weighting schemes and prove that weight misspecification induces an irreducible bias: the estimator concentrates around an induced population regression function that generally differs from the desired test regression function unless the weights are accurate. Finally, we derive consequences for plug-in classification under target shift via standard calibration arguments.

Importance Weighting Correction of Regularized Least-Squares for Target Shift

TL;DR

This work analyzes importance-weighted kernel ridge regression under target shift and shows that, because the weights depend only on the output variable, reweighting corrects the train-test mismatch without altering the input-space complexity that governs kernel generalization.

Abstract

Importance weighting is a standard tool for correcting distribution shift, but its statistical behavior under target shift -- where the label distribution changes between training and testing while the conditional distribution of inputs given the label remains stable -- remains under-explored. We analyze importance-weighted kernel ridge regression under target shift and show that, because the weights depend only on the output variable, reweighting corrects the train-test mismatch without altering the input-space complexity that governs kernel generalization. Under standard RKHS regularity and capacity conditions and a mild Bernstein-type moment condition on the label weights, we obtain finite-sample guarantees showing that the estimator achieves the same convergence behavior as in the no-shift case, with shift severity affecting only the constants through weight moments. We complement these results with matching minimax lower bounds, establishing rate optimality and quantifying the unavoidable dependence on shift severity. We further study more general weighting schemes and prove that weight misspecification induces an irreducible bias: the estimator concentrates around an induced population regression function that generally differs from the desired test regression function unless the weights are accurate. Finally, we derive consequences for plug-in classification under target shift via standard calibration arguments.
Paper Structure (31 sections, 8 theorems, 107 equations, 2 figures)

This paper contains 31 sections, 8 theorems, 107 equations, 2 figures.

Key Result

Theorem 3

Let $\rho^{\rm te}$ and $\rho^{\rm tr}$ be distributions on $X\times[-M,M]$ satisfying target shift and Assumptions source_condition, ass:eff_dim, ass:target_shift_weights. Assume $\lambda\le \|T\|$ and set, for $\delta\in(0,1)$, Then, with probability at least $1-\delta$, where $C = 3(M+R)$.

Figures (2)

  • Figure 1: Irreducible bias under target shift with misspecified weights. Incorrect weights $v_Y\neq w_Y$ induce a tilted conditional distribution $\rho^{\eta}(dy\mid x)\propto \eta(y)\rho^{\rm te}(dy\mid x)$ and hence an induced regression function $f^{\eta}=\phi/\psi$ different from the desired $f_{\rho^{\rm te}}$. The estimator concentrates around $f^{\eta}_{\mathcal{H}}$, so the gap $\|f^{\eta}_{\mathcal{H}}-f_{\mathcal{H}}\|_{\rho^{\rm te}_X}$ persists even as $n\to\infty$.
  • Figure 2: Performance comparison for different shift scenarios. Left panels show data and regression function. Right panels show MSE boxplots over 200 replications. (a) Covariate shift: well-specified unweighted model performs comparably to IW. (b) Target shift: IW correction is essential regardless of capacity.

Theorems & Definitions (12)

  • Remark 1
  • Remark 2
  • Theorem 3: IW-KRR under Target Shift
  • Theorem 4: Minimax Lower Bound for Target Shift
  • Proposition 5
  • proof
  • Theorem 6: W-KRR under Target Shift with Incorrect Weights
  • Theorem 7: Binary classification under target shift
  • proof
  • Proposition 8: Bernstein Inequality
  • ...and 2 more