Table of Contents
Fetching ...

MyriadAL: Active Few Shot Learning for Histopathology

Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, Xingyu Li

TL;DR

This work tackles label-efficient learning in digital histopathology under very limited annotation budgets by introducing Myriad Active Learning (MAL). MAL combines a self-supervised MoCoV2 encoder, pseudo-label generation, and a novel Margin-Entropy sampling strategy to drive an informative active learning loop that leverages abundant unlabelled data. Pseudo-labels refine uncertainty estimates and encourage diverse, non-redundant query selections, with the cycle iteratively improving the model as labels are acquired. On NCT-CRC-HE-100K and BreakHis, MAL delivers superior accuracy and macro F1 compared to existing FSL and AL baselines and can reach fully supervised performance with as little as 5% of the labels, indicating strong practical impact for histopathology tasks.

Abstract

Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.

MyriadAL: Active Few Shot Learning for Histopathology

TL;DR

This work tackles label-efficient learning in digital histopathology under very limited annotation budgets by introducing Myriad Active Learning (MAL). MAL combines a self-supervised MoCoV2 encoder, pseudo-label generation, and a novel Margin-Entropy sampling strategy to drive an informative active learning loop that leverages abundant unlabelled data. Pseudo-labels refine uncertainty estimates and encourage diverse, non-redundant query selections, with the cycle iteratively improving the model as labels are acquired. On NCT-CRC-HE-100K and BreakHis, MAL delivers superior accuracy and macro F1 compared to existing FSL and AL baselines and can reach fully supervised performance with as little as 5% of the labels, indicating strong practical impact for histopathology tasks.

Abstract

Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.
Paper Structure (15 sections, 4 equations, 2 figures, 6 tables, 1 algorithm)

This paper contains 15 sections, 4 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: Diagram of the proposed framework Myriad Active Learning. Pseudo-labels of the unlabelled set are updated and explored for query sample selection.
  • Figure 2: Abstracted t-SNE plot of example data (3-class). Left: 6 samples selected by classical active learning methods with entropy sampling, notably selecting many samples from the same area which are highly likely to be redundant. Right: 6 samples selected using MAL, finetuning several of the borders simultaneously, while providing anchor samples for two of the classes.