Table of Contents
Fetching ...

Multi-resolution Guided 3D GANs for Medical Image Translation

Juhyung Ha, Jong Sung Park, David Crandall, Eleftherios Garyfallidis, Xuhong Zhang

TL;DR

The paper addresses cross-modality 3D medical image translation by introducing a multi-resolution guided GAN framework that employs a 3D-mDAUNet generator and a multi-resolution discriminator, optimized with voxel-wise and 2.5D perceptual losses. The loss design combines $L_G = \lambda_1 L_{voxel} + \lambda_2 L_{perception} + \lambda_3 L_{adv}$ with $\lambda_1=\lambda_2=1$ and $\lambda_3=0.0001$, and uses voxel-wise relativistic discrimination for stable training. Extensive experiments across MRI, CBCT, and CT datasets show state-of-the-art performance in both image quality and downstream segmentation tasks, often outperforming ResViT, PTNet3D, and Ea-GAN, with Dice scores reaching up to 0.880 and 0.836 in synthetic-to-real settings. The study highlights that traditional IQA metrics may not fully capture clinical utility, advocating multifaceted evaluation and demonstrating practical potential for reducing additional imaging acquisitions. The authors also provide open-source code at github.com/juhha/3D-mADUNet.

Abstract

Medical image translation is the process of converting from one imaging modality to another, in order to reduce the need for multiple image acquisitions from the same patient. This can enhance the efficiency of treatment by reducing the time, equipment, and labor needed. In this paper, we introduce a multi-resolution guided Generative Adversarial Network (GAN)-based framework for 3D medical image translation. Our framework uses a 3D multi-resolution Dense-Attention UNet (3D-mDAUNet) as the generator and a 3D multi-resolution UNet as the discriminator, optimized with a unique combination of loss functions including voxel-wise GAN loss and 2.5D perception loss. Our approach yields promising results in volumetric image quality assessment (IQA) across a variety of imaging modalities, body regions, and age groups, demonstrating its robustness. Furthermore, we propose a synthetic-to-real applicability assessment as an additional evaluation to assess the effectiveness of synthetic data in downstream applications such as segmentation. This comprehensive evaluation shows that our method produces synthetic medical images not only of high-quality but also potentially useful in clinical applications. Our code is available at github.com/juhha/3D-mADUNet.

Multi-resolution Guided 3D GANs for Medical Image Translation

TL;DR

The paper addresses cross-modality 3D medical image translation by introducing a multi-resolution guided GAN framework that employs a 3D-mDAUNet generator and a multi-resolution discriminator, optimized with voxel-wise and 2.5D perceptual losses. The loss design combines with and , and uses voxel-wise relativistic discrimination for stable training. Extensive experiments across MRI, CBCT, and CT datasets show state-of-the-art performance in both image quality and downstream segmentation tasks, often outperforming ResViT, PTNet3D, and Ea-GAN, with Dice scores reaching up to 0.880 and 0.836 in synthetic-to-real settings. The study highlights that traditional IQA metrics may not fully capture clinical utility, advocating multifaceted evaluation and demonstrating practical potential for reducing additional imaging acquisitions. The authors also provide open-source code at github.com/juhha/3D-mADUNet.

Abstract

Medical image translation is the process of converting from one imaging modality to another, in order to reduce the need for multiple image acquisitions from the same patient. This can enhance the efficiency of treatment by reducing the time, equipment, and labor needed. In this paper, we introduce a multi-resolution guided Generative Adversarial Network (GAN)-based framework for 3D medical image translation. Our framework uses a 3D multi-resolution Dense-Attention UNet (3D-mDAUNet) as the generator and a 3D multi-resolution UNet as the discriminator, optimized with a unique combination of loss functions including voxel-wise GAN loss and 2.5D perception loss. Our approach yields promising results in volumetric image quality assessment (IQA) across a variety of imaging modalities, body regions, and age groups, demonstrating its robustness. Furthermore, we propose a synthetic-to-real applicability assessment as an additional evaluation to assess the effectiveness of synthetic data in downstream applications such as segmentation. This comprehensive evaluation shows that our method produces synthetic medical images not only of high-quality but also potentially useful in clinical applications. Our code is available at github.com/juhha/3D-mADUNet.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Generator architecture in our GAN framework: 3D multi-resolution Dense-Attention UNet (3D-mDAUNet). The encoder of this network employs a Residual-Dense-Block (RDB) for effective feature extraction, while the decoder uses a Convolution Block Attention Module (CBAM) for enhanced feature integration. Specifically, CBAM merges features learned from lower-resolution levels with those from shallower levels in the network. Additionally, the generator is designed to accept and produce multi-resolution data, enabling more efficient and stable training.
  • Figure 2: Sample results from each method and dataset. Source is the input modality fed into the model, and Target is the ground truth. The other four columns are the synthetic images generated by different methods. We present both holistic and zoomed images from multiple views (sagittal, coronal, and axial). For CT scans, we clip voxel intensities between 0 and 250 for pelvis to highlight the contrast between bones and organs.
  • Figure 3: Sample segmentation results of whole tumor segmentation outputs testing synthetic-to-real applicability using BraTS 2021. GT shows the ground-truth annotation. Real is the segmentation predicted by a model trained on real Flair MRI data. Ours is the predicted segmentation by a model trained on synthetic image generated by our model.
  • Figure 4: Sample segmentation results by TotalSegmentator on different images. Real is the predicted segmentation using a real CT image. Ours is predicted using a synthetic CT image generated by our model. Others are predicted using synthetic images generated by other translation models.