Interpretable deformable image registration: A geometric deep learning perspective
Vasiliki Sideri-Lampretsa, Nil Stolt-Ansó, Huaqi Qiu, Julian McGinnis, Wenke Karbole, Martin Menten, Daniel Rueckert
TL;DR
This work addresses the interpretability and efficiency of deformable image registration by reframing DIR as a geometry-aware, cross-domain problem. It introduces GeoReg, a dual-stream, geometric deep learning framework that separates feature extraction from deformation modeling, models deformations on continuous domains with cross-attention-based refinement, and employs a coarse-to-fine, multi-resolution strategy to expand receptive fields without repeated resampling. The approach yields improved robustness and accuracy on challenging brain MRI and longitudinal retinal OCT tasks, while offering interpretable deformation dynamics and parameter efficiency. By grounding the method in distributive convolution properties and cross-domain separation, the paper provides both practical gains and theoretical foundations for geometry-informed, data-driven registration with potential applicability to anisotropic spacing and other multi-domain imaging tasks.
Abstract
Deformable image registration poses a challenging problem where, unlike most deep learning tasks, a complex relationship between multiple coordinate systems has to be considered. Although data-driven methods have shown promising capabilities to model complex non-linear transformations, existing works employ standard deep learning architectures assuming they are general black-box solvers. We argue that understanding how learned operations perform pattern-matching between the features in the source and target domains is the key to building robust, data-efficient, and interpretable architectures. We present a theoretical foundation for designing an interpretable registration framework: separated feature extraction and deformation modeling, dynamic receptive fields, and a data-driven deformation functions awareness of the relationship between both spatial domains. Based on this foundation, we formulate an end-to-end process that refines transformations in a coarse-to-fine fashion. Our architecture employs spatially continuous deformation modeling functions that use geometric deep-learning principles, therefore avoiding the problematic approach of resampling to a regular grid between successive refinements of the transformation. We perform a qualitative investigation to highlight interesting interpretability properties of our architecture. We conclude by showing significant improvement in performance metrics over state-of-the-art approaches for both mono- and multi-modal inter-subject brain registration, as well as the challenging task of longitudinal retinal intra-subject registration. We make our code publicly available
