Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Jean V. Alves; Diogo Leitão; Sérgio Jesus; Marco O. P. Sampaio; Javier Liébana; Pedro Saleiro; Mário A. T. Figueiredo; Pedro Bizarro

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Jean V. Alves, Diogo Leitão, Sérgio Jesus, Marco O. P. Sampaio, Javier Liébana, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

TL;DR

This paper tackles the practical limitations of learning to defer (L2D) in human-AI collaboration by introducing DeCCaF, a framework that operates under cost-sensitive misclassification and explicit human workload constraints with multiple experts. It combines a classifier (h), a human-expertise model (HEM), and a constraint-based assigner to globally optimize deferrals, training under limited data (one expert per instance) and using constraint programming (CP-SAT) to respect batch-wise capacities. The authors demonstrate, on a realistic bank-fraud task with synthetic experts, that DeCCaF consistently outperforms baselines, achieving an average misclassification-cost reduction of about $8.4\%$, and provide a publicly available codebase. The work advances practical HAIC deployment by modeling expert capacity, incorporating instance-dependent costs, and validating performance across diverse data-availability and cost-structure scenarios, with implications for scalable, fair, and safe decision-support systems.

Abstract

Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset; and iii) not dealing with human work-capacity constraints. To address these issues, we propose the \textit{deferral under cost and capacity constraints framework} (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost, subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work-capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average $8.4\%$ reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

TL;DR

, and provide a publicly available codebase. The work advances practical HAIC deployment by modeling expert capacity, incorporating instance-dependent costs, and validating performance across diverse data-availability and cost-structure scenarios, with implications for scalable, fair, and safe decision-support systems.

Abstract

reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf

Paper Structure (55 sections, 31 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 55 sections, 31 equations, 8 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Current L2D Methods
Simulation of Human Experts
Method
Deferral Formulation
Data
Classifier - Rejector Formulation
Classifier Training
Rejector Training
Definition of Capacity Constraints
Global Loss Minimization under Capacity Constraints
Cost-Sensitive Learning
Experiments
Experimental Setup
...and 40 more sections

Figures (8)

Figure 1: Schematic Representation of DeCCaF
Figure 2: Mean ECE and ROC-AUC for estimates of $\mathbb{P}(y_i = m_{j,i})$. Values are calculated for each expert $j$ and averaged, with error bars representing a 95(%) confidence interval. Both methods obtain similar ROC-AUC, however, the value of $\lambda$ has significant impact on their ranking in terms of calibration, showing the importance of testing L2D methods under a wide variety of cost-structures.
Figure 3: Expected Misclassification Cost per 100 instances ($\mathbb{E}[\mathcal{C}]/100$, assuming $c_{\hbox{\scriptsize FP}} = \lambda, c_{\hbox{\scriptsize FN}} = 1$) for $a_r \in\{0.05,0.15\}$, $\lambda = 0.057$, and different amounts of training data. In each point, values are averaged across all 25 test variations, and displayed with 95% confidence intervals - DeCCaF remains significantly better in most scenarios.
Figure 4: ROC-Curve - Alert Model - Months 4-8
Figure 5: Weight Vector Heatmap for each Expert - Experts maintain feature weights across all testing scenarios.
...and 3 more figures

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

TL;DR

Abstract

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (8)