Robust Label Shift Quantification
Alexandre Lecestre
TL;DR
This work addresses the problem of robust label shift quantification under distribution shift by formulating two practical settings and showing that robust estimators align with Maximum Likelihood estimators via $\rho$-estimation grounded in the Hellinger distance. It provides finite-sample deviation bounds and convergence rates (notably $n^{-1/2}\log^{1/2} n$ in well-specified cases) and demonstrates robustness to misspecification, contamination, and outliers. The paper also develops predictor-based and unconditional emission-density frameworks, connects to calibration concepts, and links to existing methods such as MLLS, BBSE, and KMM through a unified theoretical treatment. Practically, these results justify the robustness of MLLS and offer principled guidance for reliable label-shift quantification in high-dimensional and noisy settings.
Abstract
In this paper, we investigate the label shift quantification problem. We propose robust estimators of the label distribution which turn out to coincide with the Maximum Likelihood Estimator. We analyze the theoretical aspects and derive deviation bounds for the proposed method, providing optimal guarantees in the well-specified case, along with notable robustness properties against outliers and contamination. Our results provide theoretical validation for empirical observations on the robustness of Maximum Likelihood Label Shift.
