Table of Contents
Fetching ...

Overcoming Saturation in Density Ratio Estimation by Iterated Regularization

Lukas Gruber, Markus Holzleitner, Johannes Lehner, Sepp Hochreiter, Werner Zellinger

TL;DR

This work tackles the problem of estimating density ratios $\beta=\frac{dP}{dQ}$ from finite samples, where standard kernel methods suffer error saturation and fail to achieve fast convergence rates on highly regular problems. The authors introduce iterated regularization, updating $f^{\lambda,t+1}$ via a Bregman-divergence-based objective to counter saturation and prove non-saturating, fast-rate guarantees under source and capacity conditions; together with a practical optimization scheme using the Representer Theorem and conjugate gradient. Theoretical results show fast rates $\le C (m+n)^{-\frac{2s\alpha}{2s\alpha+1}}$ when the iteration level $t$ satisfies $t\ge r+\tfrac12$, extending saturation-free performance to density-ratio estimation; empirically, iterated methods outperform their non-iterated counterparts on synthetic benchmarks and large-scale unsupervised domain-adaptation ensembles. The practical impact is improved sample efficiency and ensemble performance in domain adaptation tasks, with a public code release enabling replication and broader adoption of iterated regularization in kernel-based density-ratio estimation.

Abstract

Estimating the ratio of two probability densities from finitely many samples, is a central task in machine learning and statistics. In this work, we show that a large class of kernel methods for density ratio estimation suffers from error saturation, which prevents algorithms from achieving fast error convergence rates on highly regular learning problems. To resolve saturation, we introduce iterated regularization in density ratio estimation to achieve fast error rates. Our methods outperform its non-iteratively regularized versions on benchmarks for density ratio estimation as well as on large-scale evaluations for importance-weighted ensembling of deep unsupervised domain adaptation models.

Overcoming Saturation in Density Ratio Estimation by Iterated Regularization

TL;DR

This work tackles the problem of estimating density ratios from finite samples, where standard kernel methods suffer error saturation and fail to achieve fast convergence rates on highly regular problems. The authors introduce iterated regularization, updating via a Bregman-divergence-based objective to counter saturation and prove non-saturating, fast-rate guarantees under source and capacity conditions; together with a practical optimization scheme using the Representer Theorem and conjugate gradient. Theoretical results show fast rates when the iteration level satisfies , extending saturation-free performance to density-ratio estimation; empirically, iterated methods outperform their non-iterated counterparts on synthetic benchmarks and large-scale unsupervised domain-adaptation ensembles. The practical impact is improved sample efficiency and ensemble performance in domain adaptation tasks, with a public code release enabling replication and broader adoption of iterated regularization in kernel-based density-ratio estimation.

Abstract

Estimating the ratio of two probability densities from finitely many samples, is a central task in machine learning and statistics. In this work, we show that a large class of kernel methods for density ratio estimation suffers from error saturation, which prevents algorithms from achieving fast error convergence rates on highly regular learning problems. To resolve saturation, we introduce iterated regularization in density ratio estimation to achieve fast error rates. Our methods outperform its non-iteratively regularized versions on benchmarks for density ratio estimation as well as on large-scale evaluations for importance-weighted ensembling of deep unsupervised domain adaptation models.
Paper Structure (23 sections, 6 theorems, 32 equations, 3 figures, 36 tables)

This paper contains 23 sections, 6 theorems, 32 equations, 3 figures, 36 tables.

Key Result

Lemma 1

Any strictly proper composite loss $\ell$ with invertible link $\Psi:[0,1]\to\mathbb{R}$ and twice differentiable Bayes risk $G:[0,1]\to\mathbb{R}$ satisfies with $F(h):=-\int_\mathcal{X} (1+h(x)) G\!\left(\frac{h(x)}{1+h(x)}\right)\mathop{}\!\mathrm{d} Q(x)$ and $g(f):=\frac{\Psi^{-1}\circ f}{1-\Psi^{-1}\circ f}$.

Figures (3)

  • Figure 1: Saturation issue of classical methods Eq. \ref{['eq:regularized_Bregman_objective']} versus our novel iteratively regularized approach Eq. \ref{['eq:novel_objective']}. Left: Error rates proven in Theorem \ref{['thm:error_rates_result']} for classical methods (red) and ours (blue). Right: Error for classical KuLSIF kanamori2009least method with $F(h)=\int (h(x)-1)^2/2 q(x)\mathop{}\!\mathrm{d} x$ in Eq. \ref{['eq:regularized_Bregman_objective']} (red), and our iteratively regularized approach (blue) applied to Gaussian mixture $P$ and Gaussian distribution $Q$; smaller number of components allows higher regularity index $r$. The residual differences (grey) between the methods increase with higher regularity.
  • Figure 2: Error $\|\beta-g(f_{\bf z}^{\lambda,t})\|_{L^1([0,1])}$ of estimating $\beta$ by $g(f_{\bf z}^{\lambda,t})$ as a function of sample size $m+n$ on dataset of beugnot2021beyond. Slope for iterated estimate ($t=8$, blue) is steeper, as suggested by Theorem \ref{['thm:error_rates_result']}.
  • Figure 3: Sample efficiency curves for various density ratio estimators and approaches, the error is measured by L1-norm. Left: Comparison logistic regression and multi-layer network logistic regression density ratio estimators both without and with iteration. Right: Comparison of logistic regression with its iterated version, the telescoping (tel) approach from rhodes2020telescoping and the telescoping approach combined with our iterative method.

Theorems & Definitions (12)

  • Example 1
  • Lemma 1: menon2016linking
  • Example 2
  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 2: beugnot2021beyond
  • Lemma 3
  • proof
  • ...and 2 more