MyriadAL: Active Few Shot Learning for Histopathology
Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, Xingyu Li
TL;DR
This work tackles label-efficient learning in digital histopathology under very limited annotation budgets by introducing Myriad Active Learning (MAL). MAL combines a self-supervised MoCoV2 encoder, pseudo-label generation, and a novel Margin-Entropy sampling strategy to drive an informative active learning loop that leverages abundant unlabelled data. Pseudo-labels refine uncertainty estimates and encourage diverse, non-redundant query selections, with the cycle iteratively improving the model as labels are acquired. On NCT-CRC-HE-100K and BreakHis, MAL delivers superior accuracy and macro F1 compared to existing FSL and AL baselines and can reach fully supervised performance with as little as 5% of the labels, indicating strong practical impact for histopathology tasks.
Abstract
Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.
