Self-supervised learning for skin cancer diagnosis with limited training data

Hamish Haggerty; Rohitash Chandra

Self-supervised learning for skin cancer diagnosis with limited training data

Hamish Haggerty, Rohitash Chandra

TL;DR

It is found that minimal further SSL pre-training on task-specific data can be as effective as large-scale SSL pre-training on ImageNet for medical image classification tasks with limited labelled data.

Abstract

Early cancer detection is crucial for prognosis, but many cancer types lack large labelled datasets required for developing deep learning models. This paper investigates self-supervised learning (SSL) as an alternative to the standard supervised pre-training on ImageNet for scenarios with limited training data using a deep learning model (ResNet-50). We first demonstrate that SSL pre-training on ImageNet (via the Barlow Twins SSL algorithm) outperforms supervised pre-training (SL) using a skin lesion dataset with limited training samples. We then consider \textit{further} SSL pre-training (of the two ImageNet pre-trained models) on task-specific datasets, where our implementation is motivated by supervised transfer learning. This approach significantly enhances initially SL pre-trained models, closing the performance gap with initially SSL pre-trained ones. Surprisingly, further pre-training on just the limited fine-tuning data achieves this performance equivalence. Linear probe experiments reveal that improvement stems from enhanced feature extraction. Hence, we find that minimal further SSL pre-training on task-specific data can be as effective as large-scale SSL pre-training on ImageNet for medical image classification tasks with limited labelled data. We validate these results on an oral cancer histopathology dataset, suggesting broader applicability across medical imaging domains facing labelled data scarcity.

Self-supervised learning for skin cancer diagnosis with limited training data

TL;DR

Abstract

Paper Structure (27 sections, 2 equations, 7 figures, 13 tables)

This paper contains 27 sections, 2 equations, 7 figures, 13 tables.

Introduction
Related work
Skin cancer detection
Related medical diagnosis problems
Self-supervised learning
Methodology
Barlow Twins
Framework
ImageNet pre-trained networks
Data for supervised learning
Fine-tuning
Linear Probe
Evaluation Metrics
Further pre-training procedure
Data for further self-supervised pre-training
...and 12 more sections

Figures (7)

Figure 1: Overview of a joint embedding architecture for SSL. The encoder $f_\theta$ is typically a deep neural network such as a CNN, and the projector $p_\theta$ is typically a feedforward network with several layers. $T$ and $T'$ are distributions of data augmentations. For example, some random amount of cropping, blur etc may be applied with an example using the ISIC2019 data in Figure \ref{['fig:isic_bt_cancer']}.
Figure 2: Barlow Twins data augmentation using the ISIC2019 data where we train the network to ignore the row-wise image distortions in a non-redundant way.
Figure 3: The SSL framework shows two ResNet-50 backbone architectures pre-trained on ImageNet, either in a supervised or self-supervised fashion. We implement further self-supervised pre-training on task-specific datasets (skin condition datasets in our case) using Barlow Twins. We implement linear probe (in green) and full network fine-tuning (in yellow) each model weight initialisation. Note also the figure can be cut horizontally which splits it into weights of two kinds: pre-trained once and pre-trained twice. Cutting the figure vertically splits it into results from initially supervised pre-trained (on the left) and initially self-supervised pre-trained (on the right). Note that any suitable large dataset can be utilised for the initial pre-training phase.
Figure 4: The procedure for further self-supervised pre-training. We reinitialise the projector and train for one epoch against the frozen encoder. We then unfreeze the encoder and training proceeds as normal, where we train $P \circ f_\theta$ for several epochs on $X$ with SSL. We also consider unfreezing only the final part of the encoder, which we denote SSL$_p$ (partial encoder training). We can view this process as a function of two inputs: a given pre-trained encoder, and an unlabelled dataset, i.e. SSL(X,$f_\theta$).
Figure 5: SSL Barlow Twins augmentations on oral cancer histopathology data where we train the network to ignore the row-wise image distortions in a non-redundant way.
...and 2 more figures

Self-supervised learning for skin cancer diagnosis with limited training data

TL;DR

Abstract

Self-supervised learning for skin cancer diagnosis with limited training data

Authors

TL;DR

Abstract

Table of Contents

Figures (7)