Table of Contents
Fetching ...

Comparative Analysis of 3D Convolutional and 2.5D Slice-Conditioned U-Net Architectures for MRI Super-Resolution via Elucidated Diffusion Models

Hendrik Chiche, Ludovic Corcos, Logan Rouge

Abstract

Magnetic resonance imaging (MRI) super-resolution (SR) methods that computationally enhance low-resolution acquisitions to approximate high-resolution quality offer a compelling alternative to expensive high-field scanners. In this work we investigate an elucidated diffusion model (EDM) framework for brain MRI SR and compare two U-Net backbone architectures: (i) a full 3D convolutional U-Net that processes volumetric patches with 3D convolutions and multi-head self-attention, and (ii) a 2.5D slice-conditioned U-Net that super-resolves each slice independently while conditioning on an adjacent slice for inter-slice context. Both models employ continuous-sigma noise conditioning following Karras et al. and are trained on the NKI cohort of the FOMO60K dataset. On a held-out test set of 5 subjects (6 volumes, 993 slices), the 3D model achieves 37.75 dB PSNR, 0.997 SSIM, and 0.020 LPIPS, improving on the off-the-shelf pretrained EDSR baseline (35.57 dB / 0.024 LPIPS) and the 2.5D variant (35.82 dB) across all three metrics under the same test data and degradation pipeline.

Comparative Analysis of 3D Convolutional and 2.5D Slice-Conditioned U-Net Architectures for MRI Super-Resolution via Elucidated Diffusion Models

Abstract

Magnetic resonance imaging (MRI) super-resolution (SR) methods that computationally enhance low-resolution acquisitions to approximate high-resolution quality offer a compelling alternative to expensive high-field scanners. In this work we investigate an elucidated diffusion model (EDM) framework for brain MRI SR and compare two U-Net backbone architectures: (i) a full 3D convolutional U-Net that processes volumetric patches with 3D convolutions and multi-head self-attention, and (ii) a 2.5D slice-conditioned U-Net that super-resolves each slice independently while conditioning on an adjacent slice for inter-slice context. Both models employ continuous-sigma noise conditioning following Karras et al. and are trained on the NKI cohort of the FOMO60K dataset. On a held-out test set of 5 subjects (6 volumes, 993 slices), the 3D model achieves 37.75 dB PSNR, 0.997 SSIM, and 0.020 LPIPS, improving on the off-the-shelf pretrained EDSR baseline (35.57 dB / 0.024 LPIPS) and the 2.5D variant (35.82 dB) across all three metrics under the same test data and degradation pipeline.
Paper Structure (27 sections, 4 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 4 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the two super-resolution pipelines. (a) The 3D model (best-performing) extracts volumetric patches, processes them through a 3D U-Net with 20-step Euler sampling, and blends overlapping patches to reconstruct the full HR volume. (b) The 2.5D model extracts individual slices with one neighboring slice as context, processes each through a 2D U-Net with one-step Heun sampling, and stacks results.
  • Figure 2: PSNR and SSIM comparison across methods for $2\times$ MRI SR on the NKI test set. The 3D EDM model achieves the strongest PSNR and SSIM among the evaluated methods.
  • Figure 3: Visual comparison of all methods on a mid-sagittal slice (sub_2642). Top row: full slice. Bottom row: zoomed region of interest (red box) showing cortical folds and gray/white matter boundaries. The 3D EDM model recovers the sharpest anatomical detail, while EDSR and Swin2SR produce smoother outputs. Bicubic interpolation exhibits visible blurring.
  • Figure 4: Per-slice PSNR across the sagittal axis for subject sub_2642 (2.5D model).
  • Figure 5: 3D EDM model visual comparison across sagittal, axial, and coronal views. Each panel shows (left to right): ground truth HR, 3D model prediction, and trilinear baseline. The 3D model recovers fine cortical detail and inter-slice continuity lost in interpolation.
  • ...and 1 more figures