Table of Contents
Fetching ...

Semi-Supervised 3D Segmentation for Type-B Aortic Dissection with Slim UNETR

Denis Mikhailapov, Vladimir Berikov

TL;DR

This paper tackles the challenge of limited labeled data for 3D multi-output segmentation of type-B aortic dissection by introducing a semi-supervised framework built on rotations/flips data augmentation and EMA-generated pseudo-labels. The approach uses a Slim UNETR backbone with separate heads for ALL, TL, and FL classes, trained with a mixed loss combining Generalized Dice and Focal losses. Experiments on the ImageTBAD dataset show that semi-supervised learning can achieve full-data-like performance with only half the labels, particularly improving the difficult FL class, albeit with some early training instability when unlabeled data is introduced. The work highlights the potential of semi-supervised, multi-output segmentation to reduce labeling costs while maintaining robust, clinically useful TBAD segmentation results.

Abstract

Convolutional neural networks (CNN) for multi-class segmentation of medical images are widely used today. Especially models with multiple outputs that can separately predict segmentation classes (regions) without relying on a probabilistic formulation of the segmentation of regions. These models allow for more precise segmentation by tailoring the network's components to each class (region). They have a common encoder part of the architecture but branch out at the output layers, leading to improved accuracy. These methods are used to diagnose type B aortic dissection (TBAD), which requires accurate segmentation of aortic structures based on the ImageTBDA dataset, which contains 100 3D computed tomography angiography (CTA) images. These images identify three key classes: true lumen (TL), false lumen (FL), and false lumen thrombus (FLT) of the aorta, which is critical for diagnosis and treatment decisions. In the dataset, 68 examples have a false lumen, while the remaining 32 do not, creating additional complexity for pathology detection. However, implementing these CNN methods requires a large amount of high-quality labeled data. Obtaining accurate labels for the regions of interest can be an expensive and time-consuming process, particularly for 3D data. Semi-supervised learning methods allow models to be trained by using both labeled and unlabeled data, which is a promising approach for overcoming the challenge of obtaining accurate labels. However, these learning methods are not well understood for models with multiple outputs. This paper presents a semi-supervised learning method for models with multiple outputs. The method is based on the additional rotations and flipping, and does not assume the probabilistic nature of the model's responses. This makes it a universal approach, which is especially important for architectures that involve separate segmentation.

Semi-Supervised 3D Segmentation for Type-B Aortic Dissection with Slim UNETR

TL;DR

This paper tackles the challenge of limited labeled data for 3D multi-output segmentation of type-B aortic dissection by introducing a semi-supervised framework built on rotations/flips data augmentation and EMA-generated pseudo-labels. The approach uses a Slim UNETR backbone with separate heads for ALL, TL, and FL classes, trained with a mixed loss combining Generalized Dice and Focal losses. Experiments on the ImageTBAD dataset show that semi-supervised learning can achieve full-data-like performance with only half the labels, particularly improving the difficult FL class, albeit with some early training instability when unlabeled data is introduced. The work highlights the potential of semi-supervised, multi-output segmentation to reduce labeling costs while maintaining robust, clinically useful TBAD segmentation results.

Abstract

Convolutional neural networks (CNN) for multi-class segmentation of medical images are widely used today. Especially models with multiple outputs that can separately predict segmentation classes (regions) without relying on a probabilistic formulation of the segmentation of regions. These models allow for more precise segmentation by tailoring the network's components to each class (region). They have a common encoder part of the architecture but branch out at the output layers, leading to improved accuracy. These methods are used to diagnose type B aortic dissection (TBAD), which requires accurate segmentation of aortic structures based on the ImageTBDA dataset, which contains 100 3D computed tomography angiography (CTA) images. These images identify three key classes: true lumen (TL), false lumen (FL), and false lumen thrombus (FLT) of the aorta, which is critical for diagnosis and treatment decisions. In the dataset, 68 examples have a false lumen, while the remaining 32 do not, creating additional complexity for pathology detection. However, implementing these CNN methods requires a large amount of high-quality labeled data. Obtaining accurate labels for the regions of interest can be an expensive and time-consuming process, particularly for 3D data. Semi-supervised learning methods allow models to be trained by using both labeled and unlabeled data, which is a promising approach for overcoming the challenge of obtaining accurate labels. However, these learning methods are not well understood for models with multiple outputs. This paper presents a semi-supervised learning method for models with multiple outputs. The method is based on the additional rotations and flipping, and does not assume the probabilistic nature of the model's responses. This makes it a universal approach, which is especially important for architectures that involve separate segmentation.

Paper Structure

This paper contains 16 sections, 7 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example of data preprocessing, before and after.
  • Figure 2: Multi-Output Slim UNETR Architecture.
  • Figure 3: The scheme of the semi-supervised method.
  • Figure 4: An example of the growth of the DICE (ALL) metric in the learning process (same random seed): red – experiment 1, green – experiment 2, blue – experiment 3, light blue – experiment 4
  • Figure 5: An example of the growth of the DICE (ALL) metric for different start epoch for unlabeled training: blue – start epoch 100, pink – start epoch 200, gray – start epoch 300