Table of Contents
Fetching ...

Coverage-Constrained Human-AI Cooperation with Multiple Experts

Zheng Zhang, Cuong Nguyen, Kevin Wells, Thanh-Toan Do, David Rosewarne, Gustavo Carneiro

TL;DR

The paper tackles the challenge of high-stakes decision making by designing CL2DC, a coverage-constrained framework that unifies learning-to-defer and learning-to-complement with specific experts in multi-expert, multi-label-noise settings. It introduces a gating mechanism and a complementary module together with a penalty-based constraint to enforce a target AI-only coverage while training on pseudo-clean labels produced by CrowdLab, enabling robust optimization. Empirical results across synthetic and real-world datasets show CL2DC consistently outperforms state-of-the-art HAI-CC methods in accuracy at matched coverage, validating the approach's effectiveness for workload management and reliable expert-AI cooperation. The work advances practical MEHAI-CC by enabling precise control over AI reliance, expert-specific collaboration, and principled evaluation, with future directions including sequential expert strategies and handling imbalanced datasets.

Abstract

Human-AI cooperative classification (HAI-CC) approaches aim to develop hybrid intelligent systems that enhance decision-making in various high-stakes real-world scenarios by leveraging both human expertise and AI capabilities. Current HAI-CC methods primarily focus on learning-to-defer (L2D), where decisions are deferred to human experts, and learning-to-complement (L2C), where AI and human experts make predictions cooperatively. However, a notable research gap remains in effectively exploring both L2D and L2C under diverse expert knowledge to improve decision-making, particularly when constrained by the cooperation cost required to achieve a target probability for AI-only selection (i.e., coverage). In this paper, we address this research gap by proposing the Coverage-constrained Learning to Defer and Complement with Specific Experts (CL2DC) method. CL2DC makes final decisions through either AI prediction alone or by deferring to or complementing a specific expert, depending on the input data. Furthermore, we propose a coverage-constrained optimisation to control the cooperation cost, ensuring it approximates a target probability for AI-only selection. This approach enables an effective assessment of system performance within a specified budget. Also, CL2DC is designed to address scenarios where training sets contain multiple noisy-label annotations without any clean-label references. Comprehensive evaluations on both synthetic and real-world datasets demonstrate that CL2DC achieves superior performance compared to state-of-the-art HAI-CC methods.

Coverage-Constrained Human-AI Cooperation with Multiple Experts

TL;DR

The paper tackles the challenge of high-stakes decision making by designing CL2DC, a coverage-constrained framework that unifies learning-to-defer and learning-to-complement with specific experts in multi-expert, multi-label-noise settings. It introduces a gating mechanism and a complementary module together with a penalty-based constraint to enforce a target AI-only coverage while training on pseudo-clean labels produced by CrowdLab, enabling robust optimization. Empirical results across synthetic and real-world datasets show CL2DC consistently outperforms state-of-the-art HAI-CC methods in accuracy at matched coverage, validating the approach's effectiveness for workload management and reliable expert-AI cooperation. The work advances practical MEHAI-CC by enabling precise control over AI reliance, expert-specific collaboration, and principled evaluation, with future directions including sequential expert strategies and handling imbalanced datasets.

Abstract

Human-AI cooperative classification (HAI-CC) approaches aim to develop hybrid intelligent systems that enhance decision-making in various high-stakes real-world scenarios by leveraging both human expertise and AI capabilities. Current HAI-CC methods primarily focus on learning-to-defer (L2D), where decisions are deferred to human experts, and learning-to-complement (L2C), where AI and human experts make predictions cooperatively. However, a notable research gap remains in effectively exploring both L2D and L2C under diverse expert knowledge to improve decision-making, particularly when constrained by the cooperation cost required to achieve a target probability for AI-only selection (i.e., coverage). In this paper, we address this research gap by proposing the Coverage-constrained Learning to Defer and Complement with Specific Experts (CL2DC) method. CL2DC makes final decisions through either AI prediction alone or by deferring to or complementing a specific expert, depending on the input data. Furthermore, we propose a coverage-constrained optimisation to control the cooperation cost, ensuring it approximates a target probability for AI-only selection. This approach enables an effective assessment of system performance within a specified budget. Also, CL2DC is designed to address scenarios where training sets contain multiple noisy-label annotations without any clean-label references. Comprehensive evaluations on both synthetic and real-world datasets demonstrate that CL2DC achieves superior performance compared to state-of-the-art HAI-CC methods.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: The post-hoc analysis to generate the coverage - accuracy curves of our proposed method on the Chaoyang dataset zhu2021hard is unreliable because the same method trained with different coverage constraints produces different curves. When comparing to several HAI-CC methods charusaiedeferdcecao2024defense plotted with the same post-hoc approach, it is possible to select the curve showing the best coverage - accuracy result, which may present an overly optimistic assessment of the method's performance. For instance, our method trained for two different coverages (i.e., 0.6 in orange and 0.2 in green) show quite different performances.
  • Figure 2: CL2DC contains a gating model $g_{\phi}(.)$, a complementary module $h_{\psi}(.)$, and an AI model $f_{\theta}(.)$. The gating model aims to decide whether we use LNL-trained AI model $f_{\theta}(.)$ alone (i.e., when $g^{(\mathrm{AI})}_{\phi}(.)$ has the largest probability), defer the decision to one of the $M$ experts $\{1,\dots,M\}$ (i.e., when one of the $g^{(\mathrm{L2D}_{j})}_{\phi}(.)|_{j=1}^{M}$ has the largest probability), or complement the LNL AI model's prediction, through the complementary module, with one of the $M$ experts (i.e., when one of $g^{(\mathrm{L2C}_{j})}_{\phi}(.)|_{j=1}^{M}$ has the largest probability). In the figure, the gating model selects L2C between AI and user 1, given its largest probability of $0.8$, to make the final prediction, on the right.
  • Figure 3: Accuracy-coverage curves of our method and competing SEHAI-CC whoshould_mozannar23charusaiedeferdcecao2024defense and MEHAI-CC lecodumultil2d methods.
  • Figure 4: Accuracy-coverage curves in various evaluations: (\ref{['fig:ablation_lambda']}) different penalty coefficient $\lambda$ on CIFAR-100 dataset, (\ref{['fig:chaoyang2u']}) comparison between our method and competing HAI-CC methods on Chaoyang dataset with 2 experts, (\ref{['fig:increase_experts_cifair']}) ablation study when varying the number of experts on CIFAR-100 dataset, (\ref{['fig:ablation_l2c_galaxy']}) and (\ref{['fig:ablation_l2c_micebone']}) ablation study with and without L2C and L2C on Galaxyzoo and Micebone, respectively, and (\ref{['fig:area_under_curve_increase_experts']}) evaluation of AUARC when varying the number of experts on CIFAR-100 dataset.