Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach
Vijay Keswani, Anay Mehrotra, L. Elisa Celis
TL;DR
This work tackles partial feedback in high‑stakes classification by introducing an exploration‑based data collection framework that partitions the domain into Exploit and Explore regions and jointly learns predictions while collecting outcomes for previously ignored subpopulations. It provides strong guarantees: per‑iteration $oldsymbol{α}$‑FDR feasibility, monotone improvement of group‑wise utility, and convergence of the learned policy to the $oldsymbol{f_{ ext{opt}}^{oldsymbol{α}}}$, with convergence rates depending on exploration design and distributional properties. The method supports explicit fairness through exploitation and exploration strategies and demonstrates empirically that data quality and true positive rates improve across protected groups with minimal loss in overall utility. These results have practical impact for lending, policing, and other domains where ground truth is observed only after initial positive classifications, offering a principled balance between performance, fairness, and informative data collection.
Abstract
In many predictive contexts (e.g., credit lending), true outcomes are only observed for samples that were positively classified in the past. These past observations, in turn, form training datasets for classifiers that make future predictions. However, such training datasets lack information about the outcomes of samples that were (incorrectly) negatively classified in the past and can lead to erroneous classifiers. We present an approach that trains a classifier using available data and comes with a family of exploration strategies to collect outcome data about subpopulations that otherwise would have been ignored. For any exploration strategy, the approach comes with guarantees that (1) all sub-populations are explored, (2) the fraction of false positives is bounded, and (3) the trained classifier converges to a ``desired'' classifier. The right exploration strategy is context-dependent; it can be chosen to improve learning guarantees and encode context-specific group fairness properties. Evaluation on real-world datasets shows that this approach consistently boosts the quality of collected outcome data and improves the fraction of true positives for all groups, with only a small reduction in predictive utility.
