Table of Contents
Fetching ...

Interpretable deformable image registration: A geometric deep learning perspective

Vasiliki Sideri-Lampretsa, Nil Stolt-Ansó, Huaqi Qiu, Julian McGinnis, Wenke Karbole, Martin Menten, Daniel Rueckert

TL;DR

This work addresses the interpretability and efficiency of deformable image registration by reframing DIR as a geometry-aware, cross-domain problem. It introduces GeoReg, a dual-stream, geometric deep learning framework that separates feature extraction from deformation modeling, models deformations on continuous domains with cross-attention-based refinement, and employs a coarse-to-fine, multi-resolution strategy to expand receptive fields without repeated resampling. The approach yields improved robustness and accuracy on challenging brain MRI and longitudinal retinal OCT tasks, while offering interpretable deformation dynamics and parameter efficiency. By grounding the method in distributive convolution properties and cross-domain separation, the paper provides both practical gains and theoretical foundations for geometry-informed, data-driven registration with potential applicability to anisotropic spacing and other multi-domain imaging tasks.

Abstract

Deformable image registration poses a challenging problem where, unlike most deep learning tasks, a complex relationship between multiple coordinate systems has to be considered. Although data-driven methods have shown promising capabilities to model complex non-linear transformations, existing works employ standard deep learning architectures assuming they are general black-box solvers. We argue that understanding how learned operations perform pattern-matching between the features in the source and target domains is the key to building robust, data-efficient, and interpretable architectures. We present a theoretical foundation for designing an interpretable registration framework: separated feature extraction and deformation modeling, dynamic receptive fields, and a data-driven deformation functions awareness of the relationship between both spatial domains. Based on this foundation, we formulate an end-to-end process that refines transformations in a coarse-to-fine fashion. Our architecture employs spatially continuous deformation modeling functions that use geometric deep-learning principles, therefore avoiding the problematic approach of resampling to a regular grid between successive refinements of the transformation. We perform a qualitative investigation to highlight interesting interpretability properties of our architecture. We conclude by showing significant improvement in performance metrics over state-of-the-art approaches for both mono- and multi-modal inter-subject brain registration, as well as the challenging task of longitudinal retinal intra-subject registration. We make our code publicly available

Interpretable deformable image registration: A geometric deep learning perspective

TL;DR

This work addresses the interpretability and efficiency of deformable image registration by reframing DIR as a geometry-aware, cross-domain problem. It introduces GeoReg, a dual-stream, geometric deep learning framework that separates feature extraction from deformation modeling, models deformations on continuous domains with cross-attention-based refinement, and employs a coarse-to-fine, multi-resolution strategy to expand receptive fields without repeated resampling. The approach yields improved robustness and accuracy on challenging brain MRI and longitudinal retinal OCT tasks, while offering interpretable deformation dynamics and parameter efficiency. By grounding the method in distributive convolution properties and cross-domain separation, the paper provides both practical gains and theoretical foundations for geometry-informed, data-driven registration with potential applicability to anisotropic spacing and other multi-domain imaging tasks.

Abstract

Deformable image registration poses a challenging problem where, unlike most deep learning tasks, a complex relationship between multiple coordinate systems has to be considered. Although data-driven methods have shown promising capabilities to model complex non-linear transformations, existing works employ standard deep learning architectures assuming they are general black-box solvers. We argue that understanding how learned operations perform pattern-matching between the features in the source and target domains is the key to building robust, data-efficient, and interpretable architectures. We present a theoretical foundation for designing an interpretable registration framework: separated feature extraction and deformation modeling, dynamic receptive fields, and a data-driven deformation functions awareness of the relationship between both spatial domains. Based on this foundation, we formulate an end-to-end process that refines transformations in a coarse-to-fine fashion. Our architecture employs spatially continuous deformation modeling functions that use geometric deep-learning principles, therefore avoiding the problematic approach of resampling to a regular grid between successive refinements of the transformation. We perform a qualitative investigation to highlight interesting interpretability properties of our architecture. We conclude by showing significant improvement in performance metrics over state-of-the-art approaches for both mono- and multi-modal inter-subject brain registration, as well as the challenging task of longitudinal retinal intra-subject registration. We make our code publicly available

Paper Structure

This paper contains 31 sections, 17 equations, 16 figures, 3 tables, 2 algorithms.

Figures (16)

  • Figure 1: Our multi-resolution architecture begins by extracting features at increasingly coarse resolutions. In a coarse-to-fine fashion, the deformation function $\tau_\theta$ refines the predicted deformation over N steps within the current resolution while the learned interpolation function $\delta_\theta$ carries deformations onto the subsequent resolution. The architecture is supervised such that the majority of the deformation is modeled at the coarser (earliest) resolutions. Supervision of the finest (latest) resolutions provides learning signal to all deformations at the coarser levels.
  • Figure 2: Different approaches to sequential deformation modeling. Warping is indicated by ⓦ. a) Cascading: Previous transformation warps original image intensities. Modeling the next deformation requires feature re-extraction, leading to high computational costs. b) Feature warping: Previous transformation warps extracted source features. Computationally cheap, but warping high-dimensional features introduces interpolation errors (curse of dimensionality). c) Geometric deep learning: Coordinates of features are modeled explicitly at slight memory cost. No warping is required. Deformation function $\tau_\theta$ is aware of deformations via relative coordinates.
  • Figure 3: Visualization of the registration process of an MNIST image pair over 3 resolutions. The multi-resolution approach naturally gives rise to deformation structures and magnitudes depending on the scale.
  • Figure 4: Visualization of the registration process of an MNIST image pair under various source image augmentations. Early through the multi-resolution process, the model appears to remove most variation across augmented instances.
  • Figure 5: Qualitative results of all compared methods for the CamCAN T1w-T1w inter-subject deformable registration experiment.
  • ...and 11 more figures