DenoMamba: A fused state-space model for low-dose CT denoising
Şaban Öztürk, Oğuz Can Duran, Tolga Çukur
TL;DR
This work tackles low-dose CT denoising by introducing DenoMamba, a fused state-space model that jointly captures spatial and channel context through novel FuseSSM blocks within an hourglass encoder–decoder. By integrating a spatial SSM with a channel SSM augmented by a gated convolution, along with an identity path and a convolutional fusion module, the method preserves fine spatial details while leveraging long-range dependencies. Across 25% and 10% dose LDCT datasets, DenoMamba outperforms CNN, GAN, diffusion, and transformer-based baselines in PSNR, SSIM, and RMSE, and demonstrates robust generalization to cross-domain and dose-shift scenarios. The results highlight the practical potential of purely SSM-based denoising for high-fidelity LDCT restoration, with ablations confirming the necessity of each architectural component and fusion strategy.
Abstract
Low-dose computed tomography (LDCT) lower potential risks linked to radiation exposure while relying on advanced denoising algorithms to maintain diagnostic quality in reconstructed images. The reigning paradigm in LDCT denoising is based on neural network models that learn data-driven image priors to separate noise evoked by dose reduction from underlying tissue signals. Naturally, the fidelity of these priors depend on the model's ability to capture the broad range of contextual features evident in CT images. Earlier convolutional neural networks (CNN) are highly adept at efficiently capturing short-range spatial context, but their limited receptive fields reduce sensitivity to interactions over longer distances. Although transformers based on self-attention mechanisms have recently been posed to increase sensitivity to long-range context, they can suffer from suboptimal performance and efficiency due to elevated model complexity, particularly for high-resolution CT images. For high-quality restoration of LDCT images, here we introduce DenoMamba, a novel denoising method based on state-space modeling (SSM), that efficiently captures short- and long-range context in medical images. Following an hourglass architecture with encoder-decoder stages, DenoMamba employs a spatial SSM module to encode spatial context and a novel channel SSM module equipped with a secondary gated convolution network to encode latent features of channel context at each stage. Feature maps from the two modules are then consolidated with low-level input features via a convolution fusion module (CFM). Comprehensive experiments on LDCT datasets with 25\% and 10\% dose reduction demonstrate that DenoMamba outperforms state-of-the-art denoisers with average improvements of 1.4dB PSNR, 1.1% SSIM, and 1.6% RMSE in recovered image quality.
