Table of Contents
Fetching ...

Unsupervised Training of a Dynamic Context-Aware Deep Denoising Framework for Low-Dose Fluoroscopic Imaging

Sun-Young Jeon, Sen Wang, Adam S. Wang, Garry E. Gold, Jang-Hwan Choi

TL;DR

An unsupervised framework for dynamic, context-aware denoising in fluoroscopy is proposed, introducing the multiscale recurrent attention U-Net (MSR2AU-Net) to effectively reduce noise without clean data by directly targeting initial noise.

Abstract

Fluoroscopy is critical for real-time X-ray visualization in medical imaging. However, low-dose images are compromised by noise, potentially affecting diagnostic accuracy. Noise reduction is crucial for maintaining image quality, especially given such challenges as motion artifacts and the limited availability of clean data in medical imaging. To address these issues, we propose an unsupervised training framework for dynamic context-aware denoising of fluoroscopy image sequences. First, we train the multi-scale recurrent attention U-Net (MSR2AU-Net) without requiring clean data to address the initial noise. Second, we incorporate a knowledge distillation-based uncorrelated noise suppression module and a recursive filtering-based correlated noise suppression module enhanced with motion compensation to further improve motion compensation and achieve superior denoising performance. Finally, we introduce a novel approach by combining these modules with a pixel-wise dynamic object motion cross-fusion matrix, designed to adapt to motion, and an edge-preserving loss for precise detail retention. To validate the proposed method, we conducted extensive numerical experiments on medical image datasets, including 3500 fluoroscopy images from dynamic phantoms (2,400 images for training, 1,100 for testing) and 350 clinical images from a spinal surgery patient. Moreover, we demonstrated the robustness of our approach across different imaging modalities by testing it on the publicly available 2016 Low Dose CT Grand Challenge dataset, using 4,800 images for training and 1,136 for testing. The results demonstrate that the proposed approach outperforms state-of-the-art unsupervised algorithms in both visual quality and quantitative evaluation while achieving comparable performance to well-established supervised learning methods across low-dose fluoroscopy and CT imaging.

Unsupervised Training of a Dynamic Context-Aware Deep Denoising Framework for Low-Dose Fluoroscopic Imaging

TL;DR

An unsupervised framework for dynamic, context-aware denoising in fluoroscopy is proposed, introducing the multiscale recurrent attention U-Net (MSR2AU-Net) to effectively reduce noise without clean data by directly targeting initial noise.

Abstract

Fluoroscopy is critical for real-time X-ray visualization in medical imaging. However, low-dose images are compromised by noise, potentially affecting diagnostic accuracy. Noise reduction is crucial for maintaining image quality, especially given such challenges as motion artifacts and the limited availability of clean data in medical imaging. To address these issues, we propose an unsupervised training framework for dynamic context-aware denoising of fluoroscopy image sequences. First, we train the multi-scale recurrent attention U-Net (MSR2AU-Net) without requiring clean data to address the initial noise. Second, we incorporate a knowledge distillation-based uncorrelated noise suppression module and a recursive filtering-based correlated noise suppression module enhanced with motion compensation to further improve motion compensation and achieve superior denoising performance. Finally, we introduce a novel approach by combining these modules with a pixel-wise dynamic object motion cross-fusion matrix, designed to adapt to motion, and an edge-preserving loss for precise detail retention. To validate the proposed method, we conducted extensive numerical experiments on medical image datasets, including 3500 fluoroscopy images from dynamic phantoms (2,400 images for training, 1,100 for testing) and 350 clinical images from a spinal surgery patient. Moreover, we demonstrated the robustness of our approach across different imaging modalities by testing it on the publicly available 2016 Low Dose CT Grand Challenge dataset, using 4,800 images for training and 1,136 for testing. The results demonstrate that the proposed approach outperforms state-of-the-art unsupervised algorithms in both visual quality and quantitative evaluation while achieving comparable performance to well-established supervised learning methods across low-dose fluoroscopy and CT imaging.

Paper Structure

This paper contains 28 sections, 11 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Typical fluoroscopic images obtained from (a) a static pelvis phantom and (b) a dynamic needle tip with a spherical lesion phantom.
  • Figure 2: Schematic diagram of the proposed two-step training framework.
  • Figure 3: Architectural overview of the multi-scale recurrent attention U-Net (MSR2AU-Net). This figure illustrates the MSR2AU-Net framework employed in the first training step. The network is designed to predict the central frame from noisy X-ray sequences by leveraging multi-scale feature extraction, recurrent layers, and attention mechanisms to enhance denoising performance.
  • Figure 4: Schematic of the second training step with the network architecture used in the second training phase, where ${\hat{O}_{i}}$ signifies the frozen pre-trained MSR2AU-Net. The outputs $\hat{x}_{i}$ and $\hat{x}_{i}^r$ represent the products of the U-Net and recursive filter, respectively. Avg represents the average, and $\hat{T}_{i}$ is the result of the element-wise multiplication of $\hat{x}_{i}$ and $\hat{x}_{i}^r$. The differences $\Delta\hat{x}_i$ and $\Delta\hat{x}_i^r$ are obtained by subtracting $x_i$ and $\hat{x}_i^r$ from $\hat{T}_i$, respectively. ${var}$ is the variance operator. Finally, $\hat{x}_i^{hf}$ is the element-wise multiplication of the high-frequency components of $\hat{x}_i$, denoted as $H(\hat{x}_i)$, with $\text{var}(\Delta\hat{x}_i)\bigotimes \hat{x}_{i}^r$, and $\hat{x}_i^{r,hf}$ is the element-wise multiplication of $H(\hat{x}_i^r)$ with $var(\Delta\hat{x}_i^r)\bigotimes \hat{x}_{i}$.
  • Figure 5: Comparative denoising results on the dynamic anthropomorphic hand phantom dataset with various networks. The line profiles in the third row are plotted along the white line within the red region of interest.
  • ...and 5 more figures