Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

Hu Wang; David Butler; Yuan Zhang; Jodie Avery; Steven Knox; Congbo Ma; Louise Hull; Gustavo Carneiro

Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

Hu Wang, David Butler, Yuan Zhang, Jodie Avery, Steven Knox, Congbo Ma, Louise Hull, Gustavo Carneiro

TL;DR

The HAICOMM methodology offers a novel solution to the long-standing problem of accurately diagnosing endometriosis from MRI images, specifically in relation to the key diagnostic sign of POD obliteration.

Abstract

Endometriosis, affecting about 10% of individuals assigned female at birth, is challenging to diagnose and manage. Diagnosis typically involves the identification of various signs of the disease using either laparoscopic surgery or the analysis of T1/T2 MRI images, with the latter being quicker and cheaper but less accurate. A key diagnostic sign of endometriosis is the obliteration of the Pouch of Douglas (POD). However, even experienced clinicians struggle with accurately classifying POD obliteration from MRI images, which complicates the training of reliable AI models. In this paper, we introduce the Human-AI Collaborative Multi-modal Multi-rater Learning (HAICOMM) methodology to address the challenge above. HAICOMM is the first method that explores three important aspects of this problem: 1) multi-rater learning to extract a cleaner label from the multiple "noisy" labels available per training sample; 2) multi-modal learning to leverage the presence of T1/T2 MRI images for training and testing; and 3) human-AI collaboration to build a system that leverages the predictions from clinicians and the AI model to provide more accurate classification than standalone clinicians and AI models. Presenting results on the multi-rater T1/T2 MRI endometriosis dataset that we collected to validate our methodology, the proposed HAICOMM model outperforms an ensemble of clinicians, noisy-label learning models, and multi-rater learning methods.

Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

TL;DR

Abstract

Paper Structure (19 sections, 7 equations, 6 figures, 16 tables)

This paper contains 19 sections, 7 equations, 6 figures, 16 tables.

Introduction
Literature Review
Human-AI Collaboration
Multi-modal Learning
Multi-rater Learning
Imaging-based Endometriosis Detection
Methodology
Multi-modal Encoder Pre-training
Multi-rater Learning
Multi-modal Human-AI Collaborative Classification
Experimental Settings
Endometriosis Dataset
Implementation Details
Quantitative Evaluation Settings
Results and Discussion
...and 4 more sections

Figures (6)

Figure 1: Paired sagittal MR images, with T1-weighted (b, d) and T2-weighted (a, c) scans. The (a, b) pair shows a normal POD, while the (c, d) pair reveals POD obliteration, highlighted by a red arrow indicating significant adhesion and tissue distortion, demonstrating the loss of the soft tissue plane separating the uterine fundus from the bowel.
Figure 2: The framework of HAICOMM. The MRI encoders of HAICOMM are: (a) firstly pre-trained with a Masked Autoencoder (MAE) model; then (b) the pseudo clean labels are estimated from the multi-rater learning process; next, (c) the T1 and T2 data, along with the human-produced multi-rater labels are entered into respective feature extraction encoders -- the features from three sources are fused for the final prediction. In the figure, "FC" means fully-connected, "Fts" represents features, "Concat" denotes concatenation, and "$\oplus$" is the concatenation operation.
Figure 3: The ROC Curve of HAICOMM models and its counterparts.
Figure 4: Qualitative analysis of HAICOMM. Each row (a),(b) and (c),(d) presents the input T1 and T2 MRI images, with corresponding tables below showing predictions from three human raters (Rater #1, #2, and #3). The tables also display predictions from SSR and ProMix models trained using labels from Rater #1 (SSR w/ GT1, ProMix w/ GT1) and CROWDLAB (SSR w/ CL GT, ProMix w/ CL GT). Following these, we provide the predictions from SSR and ProMix models trained on CROWDLAB labels and utilizing human-AI collaborative classification (SSR w/ HAIC, ProMix w/ HAIC). Finally, we present results from our HAICOMM model, along with the ground truth label based on surgical data.
Figure 5: Distribution of manufacturer model of scanners for the normal (a) and abnormal (b) datasets.
...and 1 more figures

Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

TL;DR

Abstract

Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)