Table of Contents
Fetching ...

Beyond the LUMIR challenge: The pathway to foundational registration models

Junyu Chen, Shuwen Wei, Joel Honkamaa, Pekka Marttinen, Hang Zhang, Min Liu, Yichao Zhou, Zuopeng Tan, Zhuoyuan Wang, Yi Wang, Hongchao Zhou, Shunbo Hu, Yi Zhang, Qian Tao, Lukas Förner, Thomas Wendler, Bailiang Jian, Benedikt Wiestler, Tim Hable, Jin Kim, Dan Ruan, Frederic Madesta, Thilo Sentker, Wiebke Heyer, Lianrui Zuo, Yuwei Dai, Jing Wu, Jerry L. Prince, Harrison Bai, Yong Du, Yihao Liu, Alessa Hering, Reuben Dorent, Lasse Hansen, Mattias P. Heinrich, Aaron Carass

TL;DR

This work introduces the Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge as a path toward foundational registration models. By supplying 4,014 unlabeled training T1-weighted MRIs and a 590-subject test set, along with extensive zero-shot datasets spanning disease, acquisition protocol, and species variation, the study rigorously benchmarks inter-subject and atlas-to-subject registration. Across 21 methods, deep learning approaches achieve state-of-the-art accuracy and efficiency, typically producing smooth, near-diffeomorphic deformations without instance-specific optimization, and often outperforming optimization-based baselines. The results demonstrate strong zero-shot generalization when preprocessing is consistent, highlight architectural patterns (dual-stream encoders, coarse-to-fine and progressive registration), and advocate for using such large-scale, label-free training to drive the next generation of robust, general-purpose registration models with potential as foundational models in medical imaging. Overall, LUMIR provides empirical evidence for the maturity of DL-based brain MRI registration and outlines concrete directions for building scalable, robust foundation registration systems that can adapt across protocols, pathologies, and species.

Abstract

Medical image challenges have played a transformative role in advancing the field, catalyzing innovation and establishing new performance benchmarks. Image registration, a foundational task in neuroimaging, has similarly advanced through the Learn2Reg initiative. Building on this, we introduce the Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge, a next-generation benchmark for unsupervised brain MRI registration. Previous challenges relied upon anatomical label maps, however LUMIR provides 4,014 unlabeled T1-weighted MRIs for training, encouraging biologically plausible deformation modeling through self-supervision. Evaluation includes 590 in-domain test subjects and extensive zero-shot tasks across disease populations, imaging protocols, and species. Deep learning methods consistently achieved state-of-the-art performance and produced anatomically plausible, diffeomorphic deformation fields. They outperformed several leading optimization-based methods and remained robust to most domain shifts. These findings highlight the growing maturity of deep learning in neuroimaging registration and its potential to serve as a foundation model for general-purpose medical image registration.

Beyond the LUMIR challenge: The pathway to foundational registration models

TL;DR

This work introduces the Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge as a path toward foundational registration models. By supplying 4,014 unlabeled training T1-weighted MRIs and a 590-subject test set, along with extensive zero-shot datasets spanning disease, acquisition protocol, and species variation, the study rigorously benchmarks inter-subject and atlas-to-subject registration. Across 21 methods, deep learning approaches achieve state-of-the-art accuracy and efficiency, typically producing smooth, near-diffeomorphic deformations without instance-specific optimization, and often outperforming optimization-based baselines. The results demonstrate strong zero-shot generalization when preprocessing is consistent, highlight architectural patterns (dual-stream encoders, coarse-to-fine and progressive registration), and advocate for using such large-scale, label-free training to drive the next generation of robust, general-purpose registration models with potential as foundational models in medical imaging. Overall, LUMIR provides empirical evidence for the maturity of DL-based brain MRI registration and outlines concrete directions for building scalable, robust foundation registration systems that can adapt across protocols, pathologies, and species.

Abstract

Medical image challenges have played a transformative role in advancing the field, catalyzing innovation and establishing new performance benchmarks. Image registration, a foundational task in neuroimaging, has similarly advanced through the Learn2Reg initiative. Building on this, we introduce the Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge, a next-generation benchmark for unsupervised brain MRI registration. Previous challenges relied upon anatomical label maps, however LUMIR provides 4,014 unlabeled T1-weighted MRIs for training, encouraging biologically plausible deformation modeling through self-supervision. Evaluation includes 590 in-domain test subjects and extensive zero-shot tasks across disease populations, imaging protocols, and species. Deep learning methods consistently achieved state-of-the-art performance and produced anatomically plausible, diffeomorphic deformation fields. They outperformed several leading optimization-based methods and remained robust to most domain shifts. These findings highlight the growing maturity of deep learning in neuroimaging registration and its potential to serve as a foundation model for general-purpose medical image registration.

Paper Structure

This paper contains 45 sections, 10 figures, 26 tables.

Figures (10)

  • Figure 1: The top panel provides a breakdown of the training, validation, and testing data used in the LUMIR Challenge; with each source listed and the quantity of data in parentheses. The bottom panel shows from left to right, a tri-planar view of a typical T1-weighted MRI after our preprocessing, the manually identified landmarks overlaid on the same T1-weighted MRI, and finally the corresponding SLANT labels of the T1-weighted MRI. See Sec. \ref{['sec:datasets']} for complete details.
  • Figure 2: Representative registration results from the challenge evaluation task. Shown are the moving and fixed images, the deformation fields and deformed moving images for the top three methods (selected from Fig. \ref{['fig:rank_all_heatmaps']} and shown left to right in rank order), and the corresponding anatomical landmarks.
  • Figure 3: Representative registration results from the zero-shot evaluation tasks. From top to bottom the three panels show the Intersubject, Atlas to Subject (Atlas2Subject) and Subject to Atlas (Subject2Atlas) results. Each panel shows an example moving and fixed image, with the deformation fields from the top three methods (selected from Fig. \ref{['fig:rank_all_heatmaps']} and shown left to right in rank order). Also shown are the corresponding anatomical label maps for the moving and fixed images, as well as the propagated anatomical label maps for the top three methods.
  • Figure 4: Shown for each subtask is a ranking for registration accuracy (ACC), 30th percentile of Dice Similarity Coefficient (DSC30), non-diffeomorphic volume (NDV), and 30th percentile of total registration error (TRE30), across all 21 reported methods and the baseline of no registration (ZeroDisplacement ). The reported rankings aggregate all zero-shot sub-tasks within each category (inter-subject, atlas-to-subject, and subject-to-atlas), including unseen MRI contrasts and intra-species macaque registration used to evaluate cross-species generalization of human-trained models. Detailed per-dataset results are reported in the Appendix. For brief descriptions of the evaluated methods see Sec. \ref{['s:methods']}, with more detailed descriptions in \ref{['a:methods']}. A description of the ranking scheme is given in Sec. \ref{['ss:comparisons']}, with a detailed example of the ranking in \ref{['a:ranking-example']}.
  • Figure C.1: Significance matrix for pairwise method comparisons on the ADHD dataset. The upper row shows results for DSC, and the lower row shows results for HD95. Columns correspond to three registration tasks: Intersubject, Atlas to Subject (Atlas2Subject), and subject to atlas (Subject2Atlas) registration. Each cell indicates the statistical significance of performance differences between a pair of methods, assessed using Wilcoxon signed-rank post-hoc tests with Bonferroni correction. Color encodes the significance level, ranging from non-significant (NS) to $p < 0.001$. Only the upper triangular matrix is shown to avoid redundancy.
  • ...and 5 more figures