Budgeted Multiple-Expert Deferral
Giulia DeSalvo, Clara Mohri, Mehryar Mohri, Yutao Zhong
TL;DR
The paper tackles training-time costs in learning-to-defer with multiple experts by introducing a budgeted deferral framework that selectively queries expert costs. It develops a two-stage IWAL-inspired algorithm with a Sampling-Probs subroutine that prunes a version space and assigns query probabilities based on hypothesis disagreement, enabling strong generalization guarantees and favorable label complexity. Theoretical results show a square-root-type (sublinear) growth in label complexity in realizable settings and favorable dependence on disagreement metrics, while practical convex-optimization strategies support scalable implementations. Empirical evaluation across ten datasets demonstrates that the budgeted approach closely matches full-query baselines in accuracy while substantially reducing the number of queried expert costs, highlighting its practical value for resource-constrained deployments, including large language models and human annotators.
Abstract
Learning to defer uncertain predictions to costly experts offers a powerful strategy for improving the accuracy and efficiency of machine learning systems. However, standard training procedures for deferral algorithms typically require querying all experts for every training instance, an approach that becomes prohibitively expensive when expert queries incur significant computational or resource costs. This undermines the core goal of deferral: to limit unnecessary expert usage. To overcome this challenge, we introduce the budgeted deferral framework, which aims to train effective deferral algorithms while minimizing expert query costs during training. We propose new algorithms for both two-stage and single-stage multiple-expert deferral settings that selectively query only a subset of experts per training example. While inspired by active learning, our setting is fundamentally different: labels are already known, and the core challenge is to decide which experts to query in order to balance cost and predictive performance. We establish theoretical guarantees for both of our algorithms, including generalization bounds and label complexity analyses. Empirical results across several domains show that our algorithms substantially reduce training costs without sacrificing prediction accuracy, demonstrating the practical value of our budget-aware deferral algorithms.
