Table of Contents
Fetching ...

Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model

Koki Matsuishi, Tsuyoshi Okita

TL;DR

This work tackles the difficulty of learning with large bags in deep multi-instance learning (MIL) by using an instance encoder pretrained via self-supervised learning on patch images. Specifically, it pretrains on a grid of 256 patches per brain CT slice using a PCRL framework (contrastive plus reconstruction) and employs an attention-based MIL on slices with the SSL encoder as the instance feature extractor. The approach yields improved performance over a baseline Deep MIL without SSL, including notable gains in hypodensity classification accuracy and F1-score, and it can outperform supervised ImageNet pretraining in transfer settings. This demonstrates a practical route to robust MIL in medical imaging with large bag sizes and limited labels, and suggests that SSL-based patch representations can be a strong default downstream pretraining for MIL tasks.

Abstract

In deep multi-instance learning, the number of applicable instances depends on the data set. In histopathology images, deep learning multi-instance learners usually assume there are hundreds to thousands instances in a bag. However, when the number of instances in a bag increases to 256 in brain hematoma CT, learning becomes extremely difficult. In this paper, we address this drawback. To overcome this problem, we propose using a pre-trained model with self-supervised learning for the multi-instance learner as a downstream task. With this method, even when the original target task suffers from the spurious correlation problem, we show improvements of 5% to 13% in accuracy and 40% to 55% in the F1 measure for the hypodensity marker classification of brain hematoma CT.

Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model

TL;DR

This work tackles the difficulty of learning with large bags in deep multi-instance learning (MIL) by using an instance encoder pretrained via self-supervised learning on patch images. Specifically, it pretrains on a grid of 256 patches per brain CT slice using a PCRL framework (contrastive plus reconstruction) and employs an attention-based MIL on slices with the SSL encoder as the instance feature extractor. The approach yields improved performance over a baseline Deep MIL without SSL, including notable gains in hypodensity classification accuracy and F1-score, and it can outperform supervised ImageNet pretraining in transfer settings. This demonstrates a practical route to robust MIL in medical imaging with large bag sizes and limited labels, and suggests that SSL-based patch representations can be a strong default downstream pretraining for MIL tasks.

Abstract

In deep multi-instance learning, the number of applicable instances depends on the data set. In histopathology images, deep learning multi-instance learners usually assume there are hundreds to thousands instances in a bag. However, when the number of instances in a bag increases to 256 in brain hematoma CT, learning becomes extremely difficult. In this paper, we address this drawback. To overcome this problem, we propose using a pre-trained model with self-supervised learning for the multi-instance learner as a downstream task. With this method, even when the original target task suffers from the spurious correlation problem, we show improvements of 5% to 13% in accuracy and 40% to 55% in the F1 measure for the hypodensity marker classification of brain hematoma CT.

Paper Structure

This paper contains 14 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of our method. In the first stage, self-supervised learning is performed using patch images created by dividing the slices into a grid as input; in the second stage, deep multi-instance learning is performed using CT slices as bags and patch images as instances. The feature extractor for instances uses the encoder weights that have been pre-trained by self-supervised learning.
  • Figure 2: Examples of data augmentation on patch images
  • Figure 3: attention maps for hematoma detection
  • Figure 4: attention map for hypodensity (dataset 2-2) classification
  • Figure 5: attention map for hypodensity (dataset 2-3) classification
  • ...and 1 more figures