CARL: A Framework for Equivariant Image Registration

Hastings Greer; Lin Tian; Francois-Xavier Vialard; Roland Kwitt; Raul San Jose Estepar; Marc Niethammer

CARL: A Framework for Equivariant Image Registration

Hastings Greer, Lin Tian, Francois-Xavier Vialard, Roland Kwitt, Raul San Jose Estepar, Marc Niethammer

TL;DR

CARL introduces coordinate attention to construct a multi-step, diffeomorphism-aware image registration framework with built-in $[W,U]$ equivariance to independent input deformations. By replacing the first-step displacement predictor with Xi_theta and composing it with translation-equivariant refinements through a TwoStep/Downsample architecture, CARL achieves strong translation (and rotation, with augmentation) equivariance and competitive 3D medical registration accuracy. The method is supported by theoretical analyses, ablations, and extensive experiments across Abdomen1k, DirLab, and HCP datasets, including per-structure analyses and rotation robustness. Key innovations include coordinate attention, the CARL architecture, and a rigorous treatment of equivariance feasibility and guarantees for practical input domains. The results indicate CARL can outperform prior unsupervised approaches, particularly in challenging abdomen registrations with differing fields of view, while maintaining efficient forward passes and allowing optional instance optimization for improved accuracy.

Abstract

Image registration estimates spatial correspondences between a pair of images. These estimates are typically obtained via numerical optimization or regression by a deep network. A desirable property of such estimators is that a correspondence estimate (e.g., the true oracle correspondence) for an image pair is maintained under deformations of the input images. Formally, the estimator should be equivariant to a desired class of image transformations. In this work, we present careful analyses of the desired equivariance properties in the context of multi-step deep registration networks. Based on these analyses we 1) introduce the notions of $[U,U]$ equivariance (network equivariance to the same deformations of the input images) and $[W,U]$ equivariance (where input images can undergo different deformations); we 2) show that in a suitable multi-step registration setup it is sufficient for overall $[W,U]$ equivariance if the first step has $[W,U]$ equivariance and all others have $[U,U]$ equivariance; we 3) show that common displacement-predicting networks only exhibit $[U,U]$ equivariance to translations instead of the more powerful $[W,U]$ equivariance; and we 4) show how to achieve multi-step $[W,U]$ equivariance via a coordinate-attention mechanism combined with displacement-predicting refinement layers (CARL). Overall, our approach obtains excellent practical registration performance on several 3D medical image registration tasks and outperforms existing unsupervised approaches for the challenging problem of abdomen registration.

CARL: A Framework for Equivariant Image Registration

TL;DR

CARL introduces coordinate attention to construct a multi-step, diffeomorphism-aware image registration framework with built-in

equivariance to independent input deformations. By replacing the first-step displacement predictor with Xi_theta and composing it with translation-equivariant refinements through a TwoStep/Downsample architecture, CARL achieves strong translation (and rotation, with augmentation) equivariance and competitive 3D medical registration accuracy. The method is supported by theoretical analyses, ablations, and extensive experiments across Abdomen1k, DirLab, and HCP datasets, including per-structure analyses and rotation robustness. Key innovations include coordinate attention, the CARL architecture, and a rigorous treatment of equivariance feasibility and guarantees for practical input domains. The results indicate CARL can outperform prior unsupervised approaches, particularly in challenging abdomen registrations with differing fields of view, while maintaining efficient forward passes and allowing optional instance optimization for improved accuracy.

Abstract

equivariance (network equivariance to the same deformations of the input images) and

equivariance (where input images can undergo different deformations); we 2) show that in a suitable multi-step registration setup it is sufficient for overall

equivariance if the first step has

equivariance and all others have

equivariance; we 3) show that common displacement-predicting networks only exhibit

equivariance to translations instead of the more powerful

equivariance; and we 4) show how to achieve multi-step

equivariance via a coordinate-attention mechanism combined with displacement-predicting refinement layers (CARL). Overall, our approach obtains excellent practical registration performance on several 3D medical image registration tasks and outperforms existing unsupervised approaches for the challenging problem of abdomen registration.

Paper Structure (30 sections, 34 equations, 14 figures, 2 tables)

This paper contains 30 sections, 34 equations, 14 figures, 2 tables.

Introduction
Related work
Registering vector-valued images that are restricted to be diffeomorphisms
Coordinate attention
Equivariance of Coordinate Attention with convolutional encoders (XiTheta
[U,U] equivariance
Two-step registration
Experiments
Deformed retina images
Performance comparison to other methods
Rotation equivariance
Limitations
Conclusion
Per Structure DICE Box Plot
Resolution, Downsampling & Coordinates
...and 15 more sections

Figures (14)

Figure 1: Results on a pair from the Abdomen1k dataset, and a synthetically rotated example from the HCP dataset. CARL tackles large displacements and arbitrary rotations via equivariance.
Figure 2: Architecture of $\Xi_\theta$. The specific arrangement of pads and crops allows voxels to be mapped to points outside of $\Omega$ -- this is necessary to represent translation.
Figure 3: Equivariance allows CARL to generalize out-of-distribution. The left four images show the performance of CARL on a test set with the same distribution as the training set, where images are aligned in scale and translation. The middle four images show generalization to a test set Scale Shift where images are misaligned in scale and translation. The right-most figure shows the mean Dice distribution on the test set, over multiple training runs.
Figure 4: HCP evaluation while translating or rotating one image. CARL and EasyReg are unaffected by translation due to $[W,U]$ equivariance, and CARL{ROT} is additionally unaffected by rotation. GradICON DICE drops significantly when images are transformed differently due to its $[U,U]$ equivariance.
Figure 5: Per structure DICE scores on the HCP dataset. CARL ranks well on most structures.
...and 9 more figures

CARL: A Framework for Equivariant Image Registration

TL;DR

Abstract

CARL: A Framework for Equivariant Image Registration

Authors

TL;DR

Abstract

Table of Contents

Figures (14)