Comments on Friedman's Method for Class Distribution Estimation
Dirk Tasche
TL;DR
This work reframes class distribution estimation under prior probability shift as a linear-system design problem and analyzes Friedman's method within a covariance-based framework. It proves fundamental limits: the full training-posterior covariance matrix $\Sigma_P$ is singular, preventing a unique $\ell\times\ell$ solution, and shows how an $\ell-1$ equation, invertible-covariance approach yields unique estimates; it also connects DeBias and PAC as population-equivalent binary instances of a covariance-based multivariate method. The paper further elucidates that DeBias and PAC coincide in the population, and situates Friedman's method as a robust, implementation-light alternative that can outperform or match other methods depending on the test-prior regime. In a semi-asymptotic binary setting, maximum likelihood remains the most efficient, while Friedman’s method offers more uniform performance across $q_1$, highlighting practical trade-offs between variance and prior-independence in quantification tasks.
Abstract
The purpose of class distribution estimation (also known as quantification) is to determine the values of the prior class probabilities in a test dataset without class label observations. A variety of methods to achieve this have been proposed in the literature, most of them based on the assumption that the distributions of the training and test data are related through prior probability shift (also known as label shift). Among these methods, Friedman's method has recently been found to perform relatively well both for binary and multi-class quantification. We discuss the properties of Friedman's method and another approach mentioned by Friedman (called DeBias method in the literature) in the context of a general framework for designing linear equation systems for class distribution estimation.
