Table of Contents
Fetching ...

Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific Strategies

Bailiang Jian, Jiazhen Pan, Rohit Jena, Morteza Ghahremani, Hongwei Bran Li, Daniel Rueckert, Christian Wachinger, Benedikt Wiestler

TL;DR

This work investigates what truly drives progress in learning-based medical image registration by disentangling trend-driven, general-purpose architectural blocks from domain-specific registration priors. Through a modular, open benchmark spanning brain, lung, cardiac, and abdomen tasks, the authors systematically assess the contributions of trend-driven blocks (e.g., Transformers, large kernels, Mamba) versus registration-specific designs (dual-stream encoders, motion pyramids, correlation layers, iterative refinement). Across multiple datasets and domains, domain-specific priors consistently yield superior accuracy, deformation smoothness, and generalization, while trend-driven blocks offer only marginal or inconsistent gains and come with higher computational cost. The study culminates in an open, plug-and-play benchmark at rethink-reg, enabling fair, reproducible comparisons and encouraging the community to emphasize domain priors over architectural trends. Overall, the results advocate shifting research focus toward domain-specific design principles to achieve robust, generalizable progress in medical image registration.

Abstract

Medical image registration drives quantitative analysis across organs, modalities, and patient populations. Recent deep learning methods often combine low-level "trend-driven" computational blocks from computer vision, such as large-kernel CNNs, Transformers, and state-space models, with high-level registration-specific designs like motion pyramids, correlation layers, and iterative refinement. Yet, their relative contributions remain unclear and entangled. This raises a central question: should future advances in registration focus on importing generic architectural trends or on refining domain-specific design principles? Through a modular framework spanning brain, lung, cardiac, and abdominal registration, we systematically disentangle the influence of these two paradigms. Our evaluation reveals that low-level "trend-driven" computational blocks offer only marginal or inconsistent gains, while high-level registration-specific designs consistently deliver more accurate, smoother, and more robust deformations. These domain priors significantly elevate the performance of a standard U-Net baseline, far more than variants incorporating "trend-driven" blocks, achieving an average relative improvement of $\sim3\%$. All models and experiments are released within a transparent, modular benchmark that enables plug-and-play comparison for new architectures and registration tasks (https://github.com/BailiangJ/rethink-reg). This dynamic and extensible platform establishes a common ground for reproducible and fair evaluation, inviting the community to isolate genuine methodological contributions from domain priors. Our findings advocate a shift in research emphasis: from following architectural trends to embracing domain-specific design principles as the true drivers of progress in learning-based medical image registration.

Disentangling Progress in Medical Image Registration: Beyond Trend-Driven Architectures towards Domain-Specific Strategies

TL;DR

This work investigates what truly drives progress in learning-based medical image registration by disentangling trend-driven, general-purpose architectural blocks from domain-specific registration priors. Through a modular, open benchmark spanning brain, lung, cardiac, and abdomen tasks, the authors systematically assess the contributions of trend-driven blocks (e.g., Transformers, large kernels, Mamba) versus registration-specific designs (dual-stream encoders, motion pyramids, correlation layers, iterative refinement). Across multiple datasets and domains, domain-specific priors consistently yield superior accuracy, deformation smoothness, and generalization, while trend-driven blocks offer only marginal or inconsistent gains and come with higher computational cost. The study culminates in an open, plug-and-play benchmark at rethink-reg, enabling fair, reproducible comparisons and encouraging the community to emphasize domain priors over architectural trends. Overall, the results advocate shifting research focus toward domain-specific design principles to achieve robust, generalizable progress in medical image registration.

Abstract

Medical image registration drives quantitative analysis across organs, modalities, and patient populations. Recent deep learning methods often combine low-level "trend-driven" computational blocks from computer vision, such as large-kernel CNNs, Transformers, and state-space models, with high-level registration-specific designs like motion pyramids, correlation layers, and iterative refinement. Yet, their relative contributions remain unclear and entangled. This raises a central question: should future advances in registration focus on importing generic architectural trends or on refining domain-specific design principles? Through a modular framework spanning brain, lung, cardiac, and abdominal registration, we systematically disentangle the influence of these two paradigms. Our evaluation reveals that low-level "trend-driven" computational blocks offer only marginal or inconsistent gains, while high-level registration-specific designs consistently deliver more accurate, smoother, and more robust deformations. These domain priors significantly elevate the performance of a standard U-Net baseline, far more than variants incorporating "trend-driven" blocks, achieving an average relative improvement of . All models and experiments are released within a transparent, modular benchmark that enables plug-and-play comparison for new architectures and registration tasks (https://github.com/BailiangJ/rethink-reg). This dynamic and extensible platform establishes a common ground for reproducible and fair evaluation, inviting the community to isolate genuine methodological contributions from domain priors. Our findings advocate a shift in research emphasis: from following architectural trends to embracing domain-specific design principles as the true drivers of progress in learning-based medical image registration.

Paper Structure

This paper contains 68 sections, 15 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Registration-specific designs substantially enhance registration performance (up to 45.7%) over the VoxelMorph (VXM) baseline across a wide range of registration benchmarks covering multiple modalities and anatomies.
  • Figure 2: Overview of the baseline and the modular registration components. Upper Left: the Baseline architecture concatenates image pairs and predicts the deformation field directly. Middle Left: Dual-stream encoder extracts multi-resolution feature pyramids for the source and target images separately. Bottom Left: Motion Pyramid refines the deformation field progressively from coarse to fine using hierarchical features. Right: workflow of the deformation decoder at level $\ell$. Level $\ell$ corresponds to $2^{-\ell}$ resolution. Registration-specific designs are cumulatively integrated: Top: Pyramid and Warping; Middle: Correlation; Bottom: Iteration. The cube widths are illustrative and do not represent the exact channel dimensions.
  • Figure 3: Qualitative registration results on the LUMIR25 in-distribution (ID) dataset. The axial views of the source, target, and registered images are shown alongside their corresponding segmentation labels. The deformation field is illustrated as deformed grid lines. The Dice similarity coefficient (DSC, %) and the ratio of non-positive determinant voxels (NDV) across the volume are indicated in the bottom right corner.
  • Figure 4: Qualitative registration results on the NLST (in-domain) and Lung250M-4B (zero-shot) datasets. The coronal overlays of the registered moving and target images are shown alongside their corresponding deformation fields. The target and source images are visualized in blue and organ, and a perfectly aligned lung appears white in the overlay. The deformation field is visualized by a colored quiver plot where the color and arrow direction encode the displacement orientation, and arrow length represents displacement magnitude. The target registration error (TRE) for each case is indicated at the bottom of the image.
  • Figure 5: Cumulative distribution of target registration error (TRE) on the NLST (in-domain) and Lung250M-4B (zero-shot) datasets. The dotted vertical lines indicate the 75th percentile of TRE in each distribution. A larger area under the curve corresponds to better keypoints/landmarks alignment performance.
  • ...and 5 more figures