Bounds on Lp errors in density ratio estimation via f-divergence loss functions
Yoshiaki Kitazawa
TL;DR
The paper tackles the challenge of understanding how well density ratio estimation (DRE) can learn the true ratio when densities are learned via variational $f$-divergence losses. By establishing universal upper and lower bounds for the $L_p$ error that hold for Lipschitz estimators and are independent of the specific $f$-divergence, the authors reveal how data dimensionality and the KL divergence between $Q$ and $P$ jointly govern estimation accuracy. A key finding is that for $p>1$, the lower bound includes an exponential term in the KL divergence, implying the estimation error can grow rapidly as $KL(Q||P)$ increases, with this effect amplified by larger $p$. The results are supported by numerical experiments showing the predicted dependence on KL divergence and dimension, and they are framed through a mu-representation of the $f$-divergence loss that connects nearest-neighbor geometry to density-ratio estimation. This work offers theoretical guidance for selecting $f$-divergence losses and assessing sample complexity in high-dimensional DRE tasks, with practical implications for domain adaptation, generative modeling, and information-estimation methods that rely on accurate density ratios.
Abstract
Density ratio estimation (DRE) is a core technique in machine learning used to capture relationships between two probability distributions. $f$-divergence loss functions, which are derived from variational representations of $f$-divergence, have become a standard choice in DRE for achieving cutting-edge performance. This study provides novel theoretical insights into DRE by deriving upper and lower bounds on the $L_p$ errors through $f$-divergence loss functions. These bounds apply to any estimator belonging to a class of Lipschitz continuous estimators, irrespective of the specific $f$-divergence loss function employed. The derived bounds are expressed as a product involving the data dimensionality and the expected value of the density ratio raised to the $p$-th power. Notably, the lower bound includes an exponential term that depends on the Kullback--Leibler (KL) divergence, revealing that the $L_p$ error increases significantly as the KL divergence grows when $p > 1$. This increase becomes even more pronounced as the value of $p$ grows. The theoretical insights are validated through numerical experiments.
