Learning to Complement with Multiple Humans

Zheng Zhang; Cuong Nguyen; Kevin Wells; Thanh-Toan Do; Gustavo Carneiro

Learning to Complement with Multiple Humans

Zheng Zhang, Cuong Nguyen, Kevin Wells, Thanh-Toan Do, Gustavo Carneiro

TL;DR

This work tackles real-world image classification under noisy labels by enabling human-AI collaboration without access to clean labels. It proposes LECOMH, a two-stage framework that first leverages LNL pretraining and multi-rater consensus (via CROWDLAB) to bootstrap training, then jointly trains a Human-AI Selection Module and a Collaboration Module to maximize accuracy while minimizing annotation cost. The authors introduce new benchmarks with multi-rater noisy labels (CIFAR-10N/10H, Chaoyang, NIH) and show LECOMH consistently outperforms state-of-the-art HAI-CC, multi-rater, and LNL baselines across datasets, including challenging NIH and Chaoyang cases. Ablation studies highlight the necessity of LNL pretraining, MRL-based consensus, multiple human collaborators, and the learned collaboration mechanism, and the work discusses scalability, training time, and societal implications of cost-aware human-AI systems.

Abstract

Real-world image classification tasks tend to be complex, where expert labellers are sometimes unsure about the classes present in the images, leading to the issue of learning with noisy labels (LNL). The ill-posedness of the LNL task requires the adoption of strong assumptions or the use of multiple noisy labels per training image, resulting in accurate models that work well in isolation but fail to optimise human-AI collaborative classification (HAI-CC). Unlike such LNL methods, HAI-CC aims to leverage the synergies between human expertise and AI capabilities but requires clean training labels, limiting its real-world applicability. This paper addresses this gap by introducing the innovative Learning to Complement with Multiple Humans (LECOMH) approach. LECOMH is designed to learn from noisy labels without depending on clean labels, simultaneously maximising collaborative accuracy while minimising the cost of human collaboration, measured by the number of human expert annotations required per image. Additionally, new benchmarks featuring multiple noisy labels for both training and testing are proposed to evaluate HAI-CC methods. Through quantitative comparisons on these benchmarks, LECOMH consistently outperforms competitive HAI-CC approaches, human labellers, multi-rater learning, and noisy-label learning methods across various datasets, offering a promising solution for addressing real-world image classification challenges.

Learning to Complement with Multiple Humans

TL;DR

Abstract

Paper Structure (33 sections, 5 equations, 6 figures, 6 tables)

This paper contains 33 sections, 5 equations, 6 figures, 6 tables.

Introduction
Related Work
Learning with Noisy Labels (LNL)
Multi-rater Learning (MRL)
Human-AI Collaborative Classification (HAI-CC)
Learning to Defer (L2D)
Learning to Complement (L2C)
Learning to Complement with Multiple Humans (LECOMH)
Training
LNL Pre-training and Consensus Label Generation
LECOMH training
Testing
Human-AI Collaborative Benchmarks
New CIFAR-10 Benchmarks
New Chaoyang Benchmark
...and 18 more sections

Figures (6)

Figure 1: LECOMH is the first human-AI collaborative classification (HAI-CC) method that learns exclusively from multiple noisy labels and collaborates with multiple experts. Its primary objective is to optimise HAI-CC accuracy while concurrently minimising collaboration costs, measured by the number of human expert annotations required for image classification. To enable the learning from multiple noisy labels, we first train an AI model using learning with noisy label (LNL) techniques, followed by a multi-rater learning (MRL) to produce a consensus label that is then used as the ground truth label for training the two stages of HAI-CC. The first stage is the Human-AI Selection Module that estimates the number of human predictions needed for efficient and accurate human-AI collaborative classification, and the second stage is the Collaboration Module that produces the final prediction.
Figure 2: The proposed LECOMH consists of two main steps: 1. (top) estimate the consensus labels by exploiting a pre-trained LNL model coupled with an MRL module goh2022CROWDLAB; and 2. (bottom) train an LNL classifier (CLF) and a human-AI selection module by minimising both the classification error and the collaboration cost. In particular, the training step involves: 1) building the set of AI predictions and user labels, 2) training the Human-AI Selection Module to estimate the number of users to collaborate with the AI classifier, and 3) training the Collaboration Module to produce a final classification using AI predictions and selected users' labels. Testing involves similar steps to generate the final prediction.
Figure 3: Test accuracy vs. coverage of LECOMH (Ours) and competing SEHAI-CC whoshould_mozannar23 and MEHAI-CC ijcai2022-344multil2d methods. The SEHAI-CC methods are always pre-trained with LNL techniques, with the single user being simulated with aggregation (majority voting) from the pool of three annotators. Multi_L2D can defer to one of many experts, so we select the label corresponding to the maximum probability of 3 users for each sample to draw the curve.
Figure 4: Test accuracy vs coverage as a function of $\lambda$ in \ref{['eq:loss_function']} that weights the collaboration cost in our optimisation.
Figure 5: Test accuracy vs number of experts at different coverage.
...and 1 more figures

Learning to Complement with Multiple Humans

TL;DR

Abstract

Learning to Complement with Multiple Humans

Authors

TL;DR

Abstract

Table of Contents

Figures (6)