Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model
Koki Matsuishi, Tsuyoshi Okita
TL;DR
This work tackles the difficulty of learning with large bags in deep multi-instance learning (MIL) by using an instance encoder pretrained via self-supervised learning on patch images. Specifically, it pretrains on a grid of 256 patches per brain CT slice using a PCRL framework (contrastive plus reconstruction) and employs an attention-based MIL on slices with the SSL encoder as the instance feature extractor. The approach yields improved performance over a baseline Deep MIL without SSL, including notable gains in hypodensity classification accuracy and F1-score, and it can outperform supervised ImageNet pretraining in transfer settings. This demonstrates a practical route to robust MIL in medical imaging with large bag sizes and limited labels, and suggests that SSL-based patch representations can be a strong default downstream pretraining for MIL tasks.
Abstract
In deep multi-instance learning, the number of applicable instances depends on the data set. In histopathology images, deep learning multi-instance learners usually assume there are hundreds to thousands instances in a bag. However, when the number of instances in a bag increases to 256 in brain hematoma CT, learning becomes extremely difficult. In this paper, we address this drawback. To overcome this problem, we propose using a pre-trained model with self-supervised learning for the multi-instance learner as a downstream task. With this method, even when the original target task suffers from the spurious correlation problem, we show improvements of 5% to 13% in accuracy and 40% to 55% in the F1 measure for the hypodensity marker classification of brain hematoma CT.
