Table of Contents
Fetching ...

Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views

Jihoon Cho, Suhyun Ahn, Beomju Kim, Hyungjoon Bae, Xiaofeng Liu, Fangxu Xing, Kyungeun Lee, Georges Elfakhri, Van Wedeen, Jonghye Woo, Jinah Park

TL;DR

This work tackles data-efficient 3D brain segmentation by leveraging complementary 2D diffusion models trained on orthogonal views to extract 3D semantic features, which are fused into voxelwise representations to train a simple MLP segmenter. By using a large unlabeled dataset for pretraining and a sparse labeling scheme (as few as nine slices with a background region), the approach achieves competitive and often superior segmentation performance for subcortical structures compared with state-of-the-art self-supervised methods, even when the labeled data per subject is minimal. Importantly, it avoids heavy 3D diffusion modeling and demonstrates robust 3D context integration through feature fusion from axial, coronal, and sagittal views, enabling label-efficient deployment in clinical settings.

Abstract

Deep learning-based segmentation techniques have shown remarkable performance in brain segmentation, yet their success hinges on the availability of extensive labeled training data. Acquiring such vast datasets, however, poses a significant challenge in many clinical applications. To address this issue, in this work, we propose a novel 3D brain segmentation approach using complementary 2D diffusion models. The core idea behind our approach is to first mine 2D features with semantic information extracted from the 2D diffusion models by taking orthogonal views as input, followed by fusing them into a 3D contextual feature representation. Then, we use these aggregated features to train multi-layer perceptrons to classify the segmentation labels. Our goal is to achieve reliable segmentation quality without requiring complete labels for each individual subject. Our experiments on training in brain subcortical structure segmentation with a dataset from only one subject demonstrate that our approach outperforms state-of-the-art self-supervised learning methods. Further experiments on the minimum requirement of annotation by sparse labeling yield promising results even with only nine slices and a labeled background region.

Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views

TL;DR

This work tackles data-efficient 3D brain segmentation by leveraging complementary 2D diffusion models trained on orthogonal views to extract 3D semantic features, which are fused into voxelwise representations to train a simple MLP segmenter. By using a large unlabeled dataset for pretraining and a sparse labeling scheme (as few as nine slices with a background region), the approach achieves competitive and often superior segmentation performance for subcortical structures compared with state-of-the-art self-supervised methods, even when the labeled data per subject is minimal. Importantly, it avoids heavy 3D diffusion modeling and demonstrates robust 3D context integration through feature fusion from axial, coronal, and sagittal views, enabling label-efficient deployment in clinical settings.

Abstract

Deep learning-based segmentation techniques have shown remarkable performance in brain segmentation, yet their success hinges on the availability of extensive labeled training data. Acquiring such vast datasets, however, poses a significant challenge in many clinical applications. To address this issue, in this work, we propose a novel 3D brain segmentation approach using complementary 2D diffusion models. The core idea behind our approach is to first mine 2D features with semantic information extracted from the 2D diffusion models by taking orthogonal views as input, followed by fusing them into a 3D contextual feature representation. Then, we use these aggregated features to train multi-layer perceptrons to classify the segmentation labels. Our goal is to achieve reliable segmentation quality without requiring complete labels for each individual subject. Our experiments on training in brain subcortical structure segmentation with a dataset from only one subject demonstrate that our approach outperforms state-of-the-art self-supervised learning methods. Further experiments on the minimum requirement of annotation by sparse labeling yield promising results even with only nine slices and a labeled background region.
Paper Structure (10 sections, 2 equations, 3 figures, 3 tables)

This paper contains 10 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of our proposed method on subcortical brain structure segmentation. We first train multiple 2D diffusion models using an enormous unlabeled dataset. MLPs are then trained with a few labels and predict segmentation results by leveraging 3D features transformed from three perpendicular 2D diffusion models.
  • Figure 2: The schematic diagram of the proposed sparse labeling scheme. A complete label requires the annotation of entire voxels. The sparse labeling scheme is based on 2D labeled slices annotated from different views with an easily identifiable background region. The last row represents the segmenters that can be trained for each case of label.
  • Figure 3: Segmentation results on subcortical brain structure segmentation. All of the segmentation results except the last column were predicted from the models trained with a single labeled volume. The mispredictions are indicated by the red circles.