Table of Contents
Fetching ...

Distributed Deep Learning for Medical Image Denoising with Data Obfuscation

Sulaimon Oyeniyi Adebayo, Ayaz H. Khan

TL;DR

This paper investigates scalable, privacy-conscious medical image denoising by applying distributed deep learning to Gaussian-noised chest X-ray data from NIH ChestX-ray14. It compares U-Net and U-Net++ architectures under single-GPU, DataParallel, and DistributedDataParallel with Automatic Mixed Precision, demonstrating substantial training-time reductions with the optimized setup while maintaining denoising quality. U-Net++ shows superior PSNR and SSIM at mid-to-high noise levels, reflecting better structural fidelity, though LPIPS can be higher at low noise. The study provides a practical framework, including a two-site obfuscation pipeline and publicly available code, to accelerate clinical imaging workflows without centralizing sensitive data.

Abstract

Medical image denoising is essential for improving image quality while minimizing the exposure of sensitive information, particularly when working with large-scale clinical datasets. This study explores distributed deep learning for denoising chest X-ray images from the NIH Chest X-ray14 dataset, using additive Gaussian noise as a lightweight obfuscation technique. We implement and evaluate U-Net and U-Net++ architectures under single-GPU, standard multi-GPU (DataParallel), and optimized multi-GPU training configurations using PyTorch's DistributedDataParallel (DDP) and Automatic Mixed Precision (AMP). Our results show that U-Net++ consistently delivers superior denoising performance, achieving competitive Peak Signal to Noise Ratio (PSNR) and Structured Similarity Index Method (SSIM) scores, though with less performance in Learned Perceptual Image Patch Similarity (LPIPS) compared to U-Net under low and moderate noise levels. This indicates U-Net++'s enhanced structural fidelity and low perceptual similarity. Meanwhile, our optimized training pipeline reduces training time by over 60% for both models compared to single-GPU training, and outperforms standard DataParallel by over 40%, with only a minor accuracy drop for both models (trading some accuracy for speed). These findings highlight the effectiveness of software-level optimization in distributed learning for medical imaging. This work demonstrates the practical viability of combining architectural design, lightweight obfuscation, and advanced distributed training strategies to accelerate and enhance medical image processing pipelines in real-world clinical and research environments. The full implementation is publicly available at: https://github.com/Suadey/medical-image-denoising-ddp.

Distributed Deep Learning for Medical Image Denoising with Data Obfuscation

TL;DR

This paper investigates scalable, privacy-conscious medical image denoising by applying distributed deep learning to Gaussian-noised chest X-ray data from NIH ChestX-ray14. It compares U-Net and U-Net++ architectures under single-GPU, DataParallel, and DistributedDataParallel with Automatic Mixed Precision, demonstrating substantial training-time reductions with the optimized setup while maintaining denoising quality. U-Net++ shows superior PSNR and SSIM at mid-to-high noise levels, reflecting better structural fidelity, though LPIPS can be higher at low noise. The study provides a practical framework, including a two-site obfuscation pipeline and publicly available code, to accelerate clinical imaging workflows without centralizing sensitive data.

Abstract

Medical image denoising is essential for improving image quality while minimizing the exposure of sensitive information, particularly when working with large-scale clinical datasets. This study explores distributed deep learning for denoising chest X-ray images from the NIH Chest X-ray14 dataset, using additive Gaussian noise as a lightweight obfuscation technique. We implement and evaluate U-Net and U-Net++ architectures under single-GPU, standard multi-GPU (DataParallel), and optimized multi-GPU training configurations using PyTorch's DistributedDataParallel (DDP) and Automatic Mixed Precision (AMP). Our results show that U-Net++ consistently delivers superior denoising performance, achieving competitive Peak Signal to Noise Ratio (PSNR) and Structured Similarity Index Method (SSIM) scores, though with less performance in Learned Perceptual Image Patch Similarity (LPIPS) compared to U-Net under low and moderate noise levels. This indicates U-Net++'s enhanced structural fidelity and low perceptual similarity. Meanwhile, our optimized training pipeline reduces training time by over 60% for both models compared to single-GPU training, and outperforms standard DataParallel by over 40%, with only a minor accuracy drop for both models (trading some accuracy for speed). These findings highlight the effectiveness of software-level optimization in distributed learning for medical imaging. This work demonstrates the practical viability of combining architectural design, lightweight obfuscation, and advanced distributed training strategies to accelerate and enhance medical image processing pipelines in real-world clinical and research environments. The full implementation is publicly available at: https://github.com/Suadey/medical-image-denoising-ddp.

Paper Structure

This paper contains 18 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Research Workflow
  • Figure 2: Framework of the proposed distributed denoising pipeline using Gaussian noise-based data obfuscation. At Site One, clean chest X-ray images are acquired and locally obfuscated using additive Gaussian noise. The noisy images are securely transferred to Site Two, where they are processed by a U-Net/UNet++ model. Model training is performed using various GPU configurations—including single GPU, multi-GPU (DataParallel), and DDP with AMP to support scalability and diverse hardware environments. The denoised output is used for clinical interpretation by a radiologist
  • Figure 3: Dataset Sample
  • Figure 4: Training time comparison for U-Net and U-Net++ across three configurations: 1 GPU, 2 GPUs with DataParallel, and 2 GPUs with DDP + AMP. The optimized configuration yields the shortest training time for both models.
  • Figure 5: Visual comparison of noisy input, U-Net and U-Net++ denoised outputs, and ground truth (10% noise levels).
  • ...and 2 more figures