Learning to Complement and to Defer to Multiple Users

Zheng Zhang; Wenjie Ai; Kevin Wells; David Rosewarne; Thanh-Toan Do; Gustavo Carneiro

Learning to Complement and to Defer to Multiple Users

Zheng Zhang, Wenjie Ai, Kevin Wells, David Rosewarne, Thanh-Toan Do, Gustavo Carneiro

TL;DR

LECODU addresses the challenge of Human-AI Collaborative Classification by unifying learning to complement and learning to defer to multiple users within a single MEHAI-CC framework. It uses a Human-AI Selection Module and a Collaboration Module to choose among AI-alone, AI+multiple users, or deferral to multiple users, optimized with a noisy-label aware training scheme that leverages CrowdLab consensus labels and a collaboration-cost penalty. Across real-world and synthesized multi-rater benchmarks (CIFAR-10N/10H, CIFAR10-IDN, Chaoyang), LE CODU achieves higher accuracy at equivalent collaboration costs than state-of-the-art methods, including under high label-noise conditions. The approach demonstrates robustness to annotation noise and scales with the number of engaged users, offering practical benefits for real-world HAI-CC deployments while highlighting avenues for future improvements in user heterogeneity and deskilling mitigation.

Abstract

With the development of Human-AI Collaboration in Classification (HAI-CC), integrating users and AI predictions becomes challenging due to the complex decision-making process. This process has three options: 1) AI autonomously classifies, 2) learning to complement, where AI collaborates with users, and 3) learning to defer, where AI defers to users. Despite their interconnected nature, these options have been studied in isolation rather than as components of a unified system. In this paper, we address this weakness with the novel HAI-CC methodology, called Learning to Complement and to Defer to Multiple Users (LECODU). LECODU not only combines learning to complement and learning to defer strategies, but it also incorporates an estimation of the optimal number of users to engage in the decision process. The training of LECODU maximises classification accuracy and minimises collaboration costs associated with user involvement. Comprehensive evaluations across real-world and synthesized datasets demonstrate LECODU's superior performance compared to state-of-the-art HAI-CC methods. Remarkably, even when relying on unreliable users with high rates of label noise, LECODU exhibits significant improvement over both human decision-makers alone and AI alone.

Learning to Complement and to Defer to Multiple Users

TL;DR

Abstract

Paper Structure (9 sections, 4 equations, 6 figures, 4 tables)

This paper contains 9 sections, 4 equations, 6 figures, 4 tables.

Introduction
Related work
Method
Experiments
Datasets
Implementation Details
Results
Ablation study
Conclusion

Figures (6)

Figure 1: LECODU, our proposed Human-AI Collaborative Classification (HAI-CC) methodology, integrates standalone AI classification with the concepts of learning to defer mozannar2020consistent and learning to complement complement_wilder. Our training process explores learning with noisy-label (LNL) techniques to leverage a multi-rater training set (where clean labels are not available) to maximise the HAI-CC accuracy and minimise the collaboration costs represented by the number of users involved in collaborative classification.
Figure 2: LECODU contains a Human-AI Selection Module and a Collaboration Module. The Human-AI Selection Module aims to decide whether we use pre-trained LNL AI model wang2022promixzhu2021hardgarg2023instance alone, complement the LNL AI model's prediction with $\{1,...,M\}$ users, or defer the decision to an ensemble of $\{1,...,M\}$ users, where the system cost is based on the number of users in this collaboration. The Collaboration Module uses the collaborative option selected by the Human-AI Selection Module (AI alone, AI + 1 user, ..., AI + $M$ users, 1 user, ..., $M$ users) to make a final classification.
Figure 3: Test accuracy vs. collaboration cost of LECODU (Ours) and competing SEL2D whoshould_mozannar23 and ME{L2D,L2C} ijcai2022-344multil2dzhang2023learning methods. The SEL2D methods are always pre-trained with LNL techniques, with the single user being simulated with either aggregation (majority voting) or random selection (random) from the pool of three annotators. CET and Multi_L2D show results with and without LNL, while LECOMH shows results with LNL. CET always relies on three experts, resulting in a single point of accuracy vs. cost in each graph. Multi_L2D can defer to one of many experts, so we select the label corresponding to the maximum probability of 3 users for each sample to draw the curve. We truncate the accuracy for all methods at cost=10000.
Figure 4: Test accuracy vs collaboration cost of LECODU, LECODU without label noise learning (w/o LNL); LECODU (w. SH-aggregation) and (w. SH-random), denoting the reliance on a collaboration with single users (rather than multiple users) formed by aggregating labels and randomly selecting labels, respectively; and LECODU (w. aggregation label) and (w. random label), representing LECODU trained with a consensus label using majority vote and a randomly sampled training label, respectively.
Figure 5: Test accuracy and collaboration cost as a function of $\lambda$ in Equation \ref{['eq:loss_function']} that weights the collaboration cost in our optimisation.
...and 1 more figures

Learning to Complement and to Defer to Multiple Users

TL;DR

Abstract

Learning to Complement and to Defer to Multiple Users

Authors

TL;DR

Abstract

Table of Contents

Figures (6)