Table of Contents
Fetching ...

Learning to Upscale 3D Segmentations in Neuroimaging

Xiaoling Hu, Peirong Liu, Dina Zemlyanker, Jonathan Williams Ramirez, Oula Puonti, Juan Eugenio Iglesias

TL;DR

This work tackles the challenge of upsampling coarse, low-resolution neuroimaging segmentations to ultra-high-resolution 3D detail by regressing geometry-aware signed distance maps. It introduces a scalable class-conditional segmentation (SCCS) framework that trains on one class at a time to dramatically reduce memory usage, enabling end-to-end upsampling of large 3D volumes. By coupling SDF regression with geometry-aware regularizers and cross-resolution consistency, the approach yields sharper boundaries and better topological fidelity than traditional discrete segmentation, even when trained on synthetic data and coarse LR guidance. Domain randomization further enhances generalization across resolutions and scanners, and SCCS allows seamless generalization to unseen classes without retraining. The method demonstrates superior scalability, accuracy, and robustness on synthetic and real UHR brain MRI data, offering a practical pathway toward high-fidelity 3D label synthesis in neuroimaging.

Abstract

Obtaining high-resolution (HR) segmentations from coarse annotations is a pervasive challenge in computer vision. Applications include inferring pixel-level segmentations from token-level labels in vision transformers, upsampling coarse masks to full resolution, and transferring annotations from legacy low-resolution (LR) datasets to modern HR imagery. These challenges are especially acute in 3D neuroimaging, where manual labeling is costly and resolutions continually increase. We propose a scalable framework that generalizes across resolutions and domains by regressing signed distance maps, enabling smooth, boundary-aware supervision. Crucially, our model predicts one class at a time, which substantially reduces memory usage during training and inference (critical for large 3D volumes) and naturally supports generalization to unseen classes. Generalization is further improved through training on synthetic, domain-randomized data. We validate our approach on ultra-high-resolution (UHR) human brain MRI (~100 μm), where most existing methods operate at 1 mm resolution. Our framework effectively upsamples such standard-resolution segmentations to UHR detail. Results on synthetic and real data demonstrate superior scalability and generalization compared to conventional segmentation methods. Code is available at: https://github.com/HuXiaoling/Learn2Upscale.

Learning to Upscale 3D Segmentations in Neuroimaging

TL;DR

This work tackles the challenge of upsampling coarse, low-resolution neuroimaging segmentations to ultra-high-resolution 3D detail by regressing geometry-aware signed distance maps. It introduces a scalable class-conditional segmentation (SCCS) framework that trains on one class at a time to dramatically reduce memory usage, enabling end-to-end upsampling of large 3D volumes. By coupling SDF regression with geometry-aware regularizers and cross-resolution consistency, the approach yields sharper boundaries and better topological fidelity than traditional discrete segmentation, even when trained on synthetic data and coarse LR guidance. Domain randomization further enhances generalization across resolutions and scanners, and SCCS allows seamless generalization to unseen classes without retraining. The method demonstrates superior scalability, accuracy, and robustness on synthetic and real UHR brain MRI data, offering a practical pathway toward high-fidelity 3D label synthesis in neuroimaging.

Abstract

Obtaining high-resolution (HR) segmentations from coarse annotations is a pervasive challenge in computer vision. Applications include inferring pixel-level segmentations from token-level labels in vision transformers, upsampling coarse masks to full resolution, and transferring annotations from legacy low-resolution (LR) datasets to modern HR imagery. These challenges are especially acute in 3D neuroimaging, where manual labeling is costly and resolutions continually increase. We propose a scalable framework that generalizes across resolutions and domains by regressing signed distance maps, enabling smooth, boundary-aware supervision. Crucially, our model predicts one class at a time, which substantially reduces memory usage during training and inference (critical for large 3D volumes) and naturally supports generalization to unseen classes. Generalization is further improved through training on synthetic, domain-randomized data. We validate our approach on ultra-high-resolution (UHR) human brain MRI (~100 μm), where most existing methods operate at 1 mm resolution. Our framework effectively upsamples such standard-resolution segmentations to UHR detail. Results on synthetic and real data demonstrate superior scalability and generalization compared to conventional segmentation methods. Code is available at: https://github.com/HuXiaoling/Learn2Upscale.

Paper Structure

This paper contains 26 sections, 20 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the proposed LR-guided and distance-based representation framework. In addition to the standard segmentation loss $\mathcal{L}_{\text{seg}}$, we introduce a cross-resolution consistency term $\mathcal{L}_{\text{cons}}$ (see \ref{['sec:naive']} and \ref{['eq:naive']}), illustrated in the shaded region. We further regress signed distance maps ($\hat{\phi}_H$) to enable a geometry-aware representation (\ref{['sec:distance']} and \ref{['loss:sdf_loss']}), as depicted in the full workflow.
  • Figure 2: Illustration of the SCCS framework. At each training step, the model focuses on a single class, substantially reducing memory footprint and allowing flexible extension to new anatomical structures. For the selected class $c$, the segmentation loss $\mathcal{L}_{\text{seg}}^c$ and the cross-resolution consistency loss $\mathcal{L}_{\text{cons}}^c$ are combined into a total objective $\mathcal{L}_{\text{total}}^c$ (\ref{['eq:condition_loss']}), which supervises the training of the entire network.
  • Figure 3: Qualitative results. (a-c) show the input, LR guidance, and ground truth (GT). (d-g) show segmentations with different methods.
  • Figure 4: Qualitative results. (a-c) show the input, LR guidance, and ground truth (GT). (d-g) show segmentations with different methods.