I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling
Omer F. Atli, Bilal Kabas, Fuat Arslan, Arda C. Demirtas, Mahmut Yurt, Onat Dalmaz, Tolga Çukur
TL;DR
I2I-Mamba introduces a dual-domain state-space model for cross-modal medical image synthesis, combining image- and Fourier-domain SSM branches with spiral-scan tokenization and channel mixing to capture both short- and long-range context while preserving spatial detail. The architecture, featuring a high-resolution bottleneck with ddMamba blocks and residual CNNs, outperforms CNN, transformer, and prior SSM baselines across multi-contrast MRI and MRI-CT translation tasks, demonstrated on IXI, BraTS, and MRI-CT datasets. Ablation studies validate the contribution of each component, including the spiral-scan SSM and dual-domain processing, to improvements in PSNR and SSIM. The method offers a scalable, efficient solution for missing-modality imputation with potential clinical impact in reducing scan times, enabling safer imaging, and harmonizing large-scale datasets.
Abstract
Multi-modal medical image synthesis involves nonlinear transformation of tissue signals between source and target modalities, where tissues exhibit contextual interactions across diverse spatial distances. As such, the utility of a network architecture in synthesis depends on its ability to express the broad set of contextual features in medical images. Convolutional neural networks (CNNs) offer high local precision at the expense of poor sensitivity to long-range context. While transformers promise to alleviate this issue, they suffer from an unfavorable trade-off between sensitivity to long- versus short-range context due to the intrinsic complexity of attention filters. To effectively capture contextual features while avoiding the complexitydriven trade-offs, here we introduce a novel multi-modal synthesis method, I2I-Mamba, based on the state space modeling (SSM) framework. Focusing on high-level representations across a hybrid residual architecture, I2I-Mamba leverages novel dual-domain Mamba (ddMamba) blocks for complementary contextual modeling in image and Fourier domains, while maintaining spatial precision with convolutional layers. Diverting from conventional raster-scan trajectories, ddMamba leverages novel SSM operators based on a spiral-scan trajectory to learn context with enhanced angular isotropy and radial coverage, and a channel-mixing layer to aggregate context across the channel dimension. Comprehensive demonstrations on multi-contrast MRI and MRI-CT protocols indicate that I2I-Mamba outperforms state-of-the-art CNNs, transformers and SSMs.
