Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration
Bailiang Jian, Jiazhen Pan, Morteza Ghahremani, Daniel Rueckert, Christian Wachinger, Benedikt Wiestler
TL;DR
The paper investigates whether adopting 'advanced' low-level blocks (e.g., Vision Transformers, Mamba, large-kernel CNNs) genuinely improves brain MRI deformable registration. Through a modular component analysis built on a Voxelmorph baseline, it contrasts low-level replacements with high-level registration-specific designs such as coarse-to-fine motion pyramids, dual-stream encoders, correlation layers, and iterative optimization. The results reveal that advanced blocks offer little or no improvement, while high-level designs yield modest gains (about $1.5\%$ Dice, up to $5\%$ in zero-shot LPBA), with Voxelmorph remaining highly competitive. The study argues for simpler, registration-aware architectures and novel evaluation metrics, releasing code to enable broader, fair comparisons across datasets and modalities.
Abstract
Our findings indicate that adopting "advanced" computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5\% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registration components, rather than simply following the computer vision trends with "more advanced" computational blocks. We advocate for simpler yet effective solutions and novel evaluation metrics that go beyond conventional registration accuracy, warranting further research across diverse organs and modalities. The code is available at \url{https://github.com/BailiangJ/rethink-reg}.
