Binary Losses for Density Ratio Estimation
Werner Zellinger
TL;DR
This work addresses how binary loss functions used for density-ratio estimation influence the resulting error via a Bregman-divergence objective. It derives a complete characterization showing that any strictly proper composite loss compatible with a prescribed $B_\phi$ must have a specific form, enabling construction of convex losses that emphasize large density-ratio values. The authors introduce novel loss families, including an exponential-weight and polynomial-weight class, which better prioritize large $\beta$ values and improve performance in deep domain adaptation, as evidenced by extensive experiments over 484 real-world tasks and 9174 trained networks. The results demonstrate practical impact for importance weighting and parameter selection in domain adaptation, while also highlighting open questions about theoretical sample complexity.
Abstract
Estimating the ratio of two probability densities from a finite number of observations is a central machine learning problem. A common approach is to construct estimators using binary classifiers that distinguish observations from the two densities. However, the accuracy of these estimators depends on the choice of the binary loss function, raising the question of which loss function to choose based on desired error properties. For example, traditional loss functions, such as logistic or boosting loss, prioritize accurate estimation of small density ratio values over large ones, even though the latter are more critical in many applications. In this work, we start with prescribed error measures in a class of Bregman divergences and characterize all loss functions that result in density ratio estimators with small error. Our characterization extends results on composite binary losses from (Reid & Williamson, 2010) and their connection to density ratio estimation as identified by (Menon & Ong, 2016). As a result, we obtain a simple recipe for constructing loss functions with certain properties, such as those that prioritize an accurate estimation of large density ratio values. Our novel loss functions outperform related approaches for resolving parameter choice issues of 11 deep domain adaptation algorithms in average performance across 484 real-world tasks including sensor signals, texts, and images.
