Table of Contents
Fetching ...

ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images

Nabil Jabareen, Dongsheng Yuan, Sören Lukassen

TL;DR

It is demonstrated that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL) and this method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task.

Abstract

This paper demonstrates that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL). Our proposed method, ISImed, is based on the observation that medical images exhibit a much lower variability among different images compared to classic data vision benchmarks. By leveraging this resemblance of human body structures across multiple images, we establish a self-supervised objective that creates a latent representation capable of capturing its location in the physical realm. More specifically, our method involves sampling image crops and creating a distance matrix that compares the learned representation vectors of all possible combinations of these crops to the true distance between them. The intuition is, that the learned latent space is a positional encoding for a given image crop. We hypothesize, that by learning these positional encodings, comprehensive image representations have to be generated. To test this hypothesis and evaluate our method, we compare our learned representation with two state-of-the-art SSL benchmarking methods on two publicly available medical imaging datasets. We show that our method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task.

ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images

TL;DR

It is demonstrated that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL) and this method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task.

Abstract

This paper demonstrates that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL). Our proposed method, ISImed, is based on the observation that medical images exhibit a much lower variability among different images compared to classic data vision benchmarks. By leveraging this resemblance of human body structures across multiple images, we establish a self-supervised objective that creates a latent representation capable of capturing its location in the physical realm. More specifically, our method involves sampling image crops and creating a distance matrix that compares the learned representation vectors of all possible combinations of these crops to the true distance between them. The intuition is, that the learned latent space is a positional encoding for a given image crop. We hypothesize, that by learning these positional encodings, comprehensive image representations have to be generated. To test this hypothesis and evaluate our method, we compare our learned representation with two state-of-the-art SSL benchmarking methods on two publicly available medical imaging datasets. We show that our method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task.

Paper Structure

This paper contains 15 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Distribution of the absolute difference between the distance of the latent representations $D_{latent}$ and the true physical distance $D_{physical}$ of randomly sampled image patches.
  • Figure 2: Learned latent representations reveal spatial location. In (a) a two dimensional UMAP is shown with the true spatial direction being indicated by the color. Note, that for all three directions the UMAP has a clear gradient, allowing to distinguish the patch location. In (b) the first three Principal Components of a PCA are shown alongside the true physical direction.
  • Figure 3: AUC of downstream classification task. Shown is the performance of all folds from a 10-fold cross-validation. In the autoPET dataset ISImed significantly outperforms all other models (paired t-test $p<0.001$). In the BraTS dataset the combination of BarlowTwins and ISImed significantly outperforms all other models (paired t-test $p<0.001$).
  • Figure 4: Learned latent representations reveal spatial location in the BraTS dataset. In (a) a two dimensional UMAP is shown with the true spatial direction being indicated by the color. Note, that for all three directions the UMAP has a clear gradient, allowing to distinguish the patch location. In (b) the first three Principal Components of a PCA are shown alongside the true physical direction.