SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI

Benjamin Billot; Neel Dey; Daniel Moyer; Malte Hoffmann; Esra Abaci Turk; Borjan Gagoski; Ellen Grant; Polina Golland

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI

Benjamin Billot, Neel Dey, Daniel Moyer, Malte Hoffmann, Esra Abaci Turk, Borjan Gagoski, Ellen Grant, Polina Golland

TL;DR

This paper presents EquiTrack, a hybrid framework for 3D rigid motion tracking in brain MRI that jointly leverages SE(3)-equivariant steerable CNNs and a denoising network to achieve noise-invariant, pose-consistent feature extraction. By transforming noisy inputs into a common intensity space with $\Psi$ and extracting SE(3)-equivariant features with $\Phi$, the method builds two corresponding point clouds and estimates the rigid transform $\hat{T}$ via a differentiable closed-form, weighted Procrustes approach. Across adult and challenging fetal MRI time series, EquiTrack outperforms state-of-the-art learning and optimisation baselines, often by large margins, and runs in under a second, enabling potential real-time clinical deployment. The approach demonstrates robustness to large motions and intensity variations, and its modular design suggests extensions to broader registration tasks and affine transformations in the future.

Abstract

Rigid motion tracking is paramount in many medical imaging applications where movements need to be detected, corrected, or accounted for. Modern strategies rely on convolutional neural networks (CNN) and pose this problem as rigid registration. Yet, CNNs do not exploit natural symmetries in this task, as they are equivariant to translations (their outputs shift with their inputs) but not to rotations. Here we propose EquiTrack, the first method that uses recent steerable SE(3)-equivariant CNNs (E-CNN) for motion tracking. While steerable E-CNNs can extract corresponding features across different poses, testing them on noisy medical images reveals that they do not have enough learning capacity to learn noise invariance. Thus, we introduce a hybrid architecture that pairs a denoiser with an E-CNN to decouple the processing of anatomically irrelevant intensity features from the extraction of equivariant spatial features. Rigid transforms are then estimated in closed-form. EquiTrack outperforms state-of-the-art learning and optimisation methods for motion tracking in adult brain MRI and fetal MRI time series. Our code is available at https://github.com/BBillot/EquiTrack.

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI

TL;DR

and extracting SE(3)-equivariant features with

, the method builds two corresponding point clouds and estimates the rigid transform

via a differentiable closed-form, weighted Procrustes approach. Across adult and challenging fetal MRI time series, EquiTrack outperforms state-of-the-art learning and optimisation baselines, often by large margins, and runs in under a second, enabling potential real-time clinical deployment. The approach demonstrates robustness to large motions and intensity variations, and its modular design suggests extensions to broader registration tasks and affine transformations in the future.

Abstract

Paper Structure (42 sections, 12 equations, 9 figures, 4 tables)

This paper contains 42 sections, 12 equations, 9 figures, 4 tables.

Introduction
Motivation
Contributions
Related work
Optimisation-based motion tracking
Optimisation-based rigid registration
Landmark registration
Learning-based registration
Equivariant networks
Disentangling methods
Methods
Problem formulation
Denoising CNN
SE(3)-equivariant feature extraction
Non-scalar fields and representations
...and 27 more sections

Figures (9)

Figure 1: Overview of EquiTrack. The fixed and moving volumes are first processed with a denoising CNN that removes anatomically irrelevant intensity features (noise, histogram shifts, etc.), so that its outputs only differ by the unknown rigid transform. Crucially, we then use a steerable SE(3)-equivariant E-CNN to extract $K$ matching anatomical features across images. A rigid transform $\hat{T}$ is estimated by computing summary statistics (centres of mass), providing us with two corresponding point clouds that are registered with a differentiable closed-form algorithmhorn_closed-form_1987.
Figure 2: Box plots for rotation errors, translation errors, and Dice scores on simulated pairs from the Adult and Fetal-I datasets. EquiTrack is significantly better than all other methods at the 5% level (Bonferroni-corrected two-sided non-parametric Wilcoxon signed-rank test), except for RPM and ANTs in terms of rotation error and Dice in adults (no statistically significant difference).
Figure 3: Example of the intermediate representations of EquiTrack. A reference scan (left) is either augmented with a large intensity transform (top row), a large spatial deformation (middle), or both (bottom). The denoiser $\Psi$ then removes noisy intensity features when appropriate (note how the middle example is left intact as no noise was added to it). After having removed intensity discrepancies across scans, the E-CNN $\Phi$ can now process volumes that only differ by their pose, which ensures the extraction of matching features across poses due to the SE(3)-equivariance of $\Phi$. These features are then used to accurately estimate rigid transforms back to the original volume whose brain mask is outlined in red for reference.
Figure 4: Sample registrations for representative methods on 3D pairs simulated from Fetal-I with small and large movements. The brain mask of the fixed image is shown in red for reference. Smoothness in registered images is due to interpolation. For small movements (top), all methods yield accurate or adequate results. In the case of large motion (bottom), only EquiTrack and ANTs (although less accurate) produce correct alignments.
Figure 5: Example of features produced by KeyMorph-SVD and EquiTrack on a simulated pair from Fetal-I. The regular CNN of KeyMorph-SVD can lead to inconsistent representations across poses (red arrows). In contrast, The E-CNN used in EquiTrack guarantees the extraction of SE(3)-equivariant features, which enable accurate rigid registration.
...and 4 more figures

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI

TL;DR

Abstract

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI

Authors

TL;DR

Abstract

Table of Contents

Figures (9)