Table of Contents
Fetching ...

BioAtt: Anatomical Prior Driven Low-Dose CT Denoising

Namhun Kim, UiHyun Cho

TL;DR

Low-Dose CT denoising often sacrifices anatomical detail when using purely data-driven methods. BioAtt addresses this by injecting organ-aware priors derived from BiomedCLIP into the spatial attention mechanism to preserve clinically relevant structures while suppressing noise. It demonstrates superior SSIM and competitive RMSE/PSNR against baselines and attention variants, supported by ablations and attention-map visualizations that confirm anatomy-guided improvements rather than increased model complexity. The approach establishes a new architectural paradigm for integrating semantic anatomical priors into LDCT denoising, with potential for further gains by combining segmentation priors and more diverse textual descriptors.

Abstract

Deep-learning-based denoising methods have significantly improved Low-Dose CT (LDCT) image quality. However, existing models often over-smooth important anatomical details due to their purely data-driven attention mechanisms. To address this challenge, we propose a novel LDCT denoising framework, BioAtt. The key innovation lies in attending anatomical prior distributions extracted from the pretrained vision-language model BiomedCLIP. These priors guide the denoising model to focus on anatomically relevant regions to suppress noise while preserving clinically relevant structures. We highlight three main contributions: BioAtt outperforms baseline and attention-based models in SSIM, PSNR, and RMSE across multiple anatomical regions. The framework introduces a new architectural paradigm by embedding anatomic priors directly into spatial attention. Finally, BioAtt attention maps provide visual confirmation that the improvements stem from anatomical guidance rather than increased model complexity.

BioAtt: Anatomical Prior Driven Low-Dose CT Denoising

TL;DR

Low-Dose CT denoising often sacrifices anatomical detail when using purely data-driven methods. BioAtt addresses this by injecting organ-aware priors derived from BiomedCLIP into the spatial attention mechanism to preserve clinically relevant structures while suppressing noise. It demonstrates superior SSIM and competitive RMSE/PSNR against baselines and attention variants, supported by ablations and attention-map visualizations that confirm anatomy-guided improvements rather than increased model complexity. The approach establishes a new architectural paradigm for integrating semantic anatomical priors into LDCT denoising, with potential for further gains by combining segmentation priors and more diverse textual descriptors.

Abstract

Deep-learning-based denoising methods have significantly improved Low-Dose CT (LDCT) image quality. However, existing models often over-smooth important anatomical details due to their purely data-driven attention mechanisms. To address this challenge, we propose a novel LDCT denoising framework, BioAtt. The key innovation lies in attending anatomical prior distributions extracted from the pretrained vision-language model BiomedCLIP. These priors guide the denoising model to focus on anatomically relevant regions to suppress noise while preserving clinically relevant structures. We highlight three main contributions: BioAtt outperforms baseline and attention-based models in SSIM, PSNR, and RMSE across multiple anatomical regions. The framework introduces a new architectural paradigm by embedding anatomic priors directly into spatial attention. Finally, BioAtt attention maps provide visual confirmation that the improvements stem from anatomical guidance rather than increased model complexity.

Paper Structure

This paper contains 19 sections, 6 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Overview of anatomical prior extraction process. Given a low-dose CT image $\mathbf{I}_p$, a set of anatomical descriptors $\{t_i\}_{i=1}^{N}$ is tokenized and encoded using a pretrained text encoder. BiomedCLIP jointly embeds the image and text to compute similarity scores $S_i$, which are normalized via softmax to yield a probability distribution $\mathbf{p}$. Each $p_i$ reflects the estimated likelihood of a specific anatomical structure in the input image. These anatomical priors guide spatial attention in the denoising network.
  • Figure 2: Overview of the organ-aware spatial attention module. The input feature map undergoes both average and max pooling along the channel axis. Descriptors are then concatenated and passed through a convolutional layer. This produces a multi-channel attention map corresponding to $N$ different anatomical structures. The attention maps are then modulated by anatomical prior probabilities and summed across organs to yield a unified spatial attention map, which is applied back to the original feature map to emphasize clinically relevant regions.
  • Figure 3: Overall architecture of BioAtt. The input low-dose CT image is first divided into patches. These patches are passed through an encoder composed of convolutional layers with two spatial attention modules guided by anatomical priors. The decoder then reconstructs the denoised image from the refined feature maps to restore the full-resolution CT image.
  • Figure 4: Comparison of $16$quarter_1mm and full_1mm patches preprocessed from Mayo-2016 Dataset.
  • Figure 5: Performance comparison of four models: Base, Channel, Spatial, and BioAtt. The top row shows evaluation metrics (RMSE, PSNR, and SSIM) with mean and standard deviation across test samples. The bottom row illustrates the trend of each metric over training epochs (1, 5, 10, and 20). While all attention-augmented models outperform the baseline, BioAtt consistently achieves higher SSIM and demonstrates stable improvements throughout training.
  • ...and 10 more figures