Existence of Direct Density Ratio Estimators
Erika Banzato, Mathias Drton, Kian Saraf-Poor, Hongjian Shi
TL;DR
The paper investigates the existence of KLIEP estimators for direct density-ratio differences in exponential-family models under two-sample settings. It shows that, in the unregularized case, a global minimum exists if the average sufficient statistic $\bar{\boldsymbol{t}}^{x}$ lies in the relative interior of the convex hull $\boldsymbol{C}^{y}$ of the second sample's sufficient statistics, with boundary or exterior cases leading to nonexistence or unboundedness. For high-dimensional problems, it introduces a dual-norm distance threshold $\lambda^{\#}$ and shows that a regularized KLIEP estimator exists only when the regularization parameter $\lambda$ meets or exceeds $\lambda^{\#}$, otherwise the problem may be ill-posed. The empirical analysis in differential-network settings reveals that common regularization choices can fail to guarantee existence, motivating elastic-net-type penalties to ensure a global minimum. These results clarify feasibility conditions for KLIEP in practice and guide robust regularization for high-dimensional two-sample problems.
Abstract
Many two-sample problems call for a comparison of two distributions from an exponential family. Density ratio estimation methods provide ways to solve such problems through direct estimation of the differences in natural parameters. The term direct indicates that one avoids estimating both marginal distributions. In this context, we consider the Kullback--Leibler Importance Estimation Procedure (KLIEP), which has been the subject of recent work on differential networks. Our main result shows that the existence of the KLIEP estimator is characterized by whether the average sufficient statistic for one sample belongs to the convex hull of the set of all sufficient statistics for data points in the second sample. For high-dimensional problems it is customary to regularize the KLIEP loss by adding the product of a tuning parameter and a norm of the vector of parameter differences. We show that the existence of the regularized KLIEP estimator requires the tuning parameter to be no less than the dual norm-based distance between the average sufficient statistic and the convex hull. The implications of these existence issues are explored in applications to differential network analysis.
