Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift
Shuntuo Xu, Zhou Yu, Jian Huang
TL;DR
This work develops a theoretical framework for estimating density ratios under covariate shift when both the domain and range can be unbounded, by leveraging Bregman divergences with least-squares and logistic-regression losses. It establishes near-minimax estimation rates under local Hölder smoothness and sub-exponential tail assumptions, with neural networks and truncation enabling practical implementation on unbounded spaces. A key insight is that the tail behavior of r_0(X^s) can enable effective generalization from the source to the target without explicit density-ratio loss correction, and in some cases even outperforming corrected estimators. The theory is applied to nonparametric regression and conditional flow models, with simulation studies showing that source estimators generalize well across covariate shifts and can be preferable to density-ratio corrections in practice. Overall, the paper provides both rigorous error bounds and practical guidance for transfer learning tasks where densities are unbounded and shifts are present.
Abstract
The density ratio is an important metric for evaluating the relative likelihood of two probability distributions, with extensive applications in statistics and machine learning. However, existing estimation theories for density ratios often depend on stringent regularity conditions, mainly focusing on density ratio functions with bounded domains and ranges. In this paper, we study density ratio estimators using loss functions based on least squares and logistic regression. We establish upper bounds on estimation errors with standard minimax optimal rates, up to logarithmic factors. Our results accommodate density ratio functions with unbounded domains and ranges. We apply our results to nonparametric regression and conditional flow models under covariate shift and identify the tail properties of the density ratio as crucial for error control across domains affected by covariate shift. We provide sufficient conditions under which loss correction is unnecessary and demonstrate effective generalization capabilities of a source estimator to any suitable target domain. Our simulation experiments support these theoretical findings, indicating that the source estimator can outperform those derived from loss correction methods, even when the true density ratio is known.
