Table of Contents
Fetching ...

Self-Supervised Alignment Learning for Medical Image Segmentation

Haofeng Li, Yiming Ouyang, Xiang Wan

TL;DR

Self-Supervised Alignment Learning (SAL) addresses label scarcity in 3D medical image segmentation by pre-training with slice-aware supervision. It combines a Local Alignment loss that enforces patch-level correspondence between adjacent slices and a Global Positional loss that leverages relative slice position for global consistency, with a windowed variant to scale computation. Across CT and MRI datasets, SAL achieves competitive or superior Dice scores under limited annotations, outperforming several SSL baselines and showing robustness to hyperparameters. By exploiting intra-volume spatial structure during pre-training, SAL yields data-efficient segmentation improvements with practical impact for clinical imaging tasks.

Abstract

Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment learning framework to pre-train the neural network for medical image segmentation. The proposed framework consists of a new local alignment loss and a global positional loss. We observe that in the same 3D scan, two close 2D slices usually contain similar anatomic structures. Thus, the local alignment loss is proposed to make the pixel-level features of matched structures close to each other. Experimental results show that the proposed alignment learning is competitive with existing self-supervised pre-training approaches on CT and MRI datasets, under the setting of limited annotations.

Self-Supervised Alignment Learning for Medical Image Segmentation

TL;DR

Self-Supervised Alignment Learning (SAL) addresses label scarcity in 3D medical image segmentation by pre-training with slice-aware supervision. It combines a Local Alignment loss that enforces patch-level correspondence between adjacent slices and a Global Positional loss that leverages relative slice position for global consistency, with a windowed variant to scale computation. Across CT and MRI datasets, SAL achieves competitive or superior Dice scores under limited annotations, outperforming several SSL baselines and showing robustness to hyperparameters. By exploiting intra-volume spatial structure during pre-training, SAL yields data-efficient segmentation improvements with practical impact for clinical imaging tasks.

Abstract

Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment learning framework to pre-train the neural network for medical image segmentation. The proposed framework consists of a new local alignment loss and a global positional loss. We observe that in the same 3D scan, two close 2D slices usually contain similar anatomic structures. Thus, the local alignment loss is proposed to make the pixel-level features of matched structures close to each other. Experimental results show that the proposed alignment learning is competitive with existing self-supervised pre-training approaches on CT and MRI datasets, under the setting of limited annotations.
Paper Structure (13 sections, 3 equations, 2 figures, 4 tables)

This paper contains 13 sections, 3 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The proposed Self-supervised Alignment Learning (SAL) framework. 'Subject_1&2' denote two sequences of 2D slices. ①-④ represent different selections of slice pairs. The feature maps are divided into $r^2$ windows of size $\frac{h}{r}\times \frac{w}{r}$.
  • Figure 2: The proposed Local Alignment loss function. $S_i$ and $S_j$ are two nearby slices in the same 3D image. Their feature maps are divided into windows by red dotted lines. '$\sim 1$' is to make the maximum similarity of each row close to 1, using MAE loss. $V$ is the number of slices and $t$ is the threshold deciding if two slices are used to compute the LA loss.