SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking
Jian Wang, Razieh Faghihpirayesh, Polina Golland, Ali Gholipour
TL;DR
SpaER addresses fetal MRI motion artifacts by learning spatio-temporal equivariant representations to track rigid brain motion across time without data augmentation. It fuses rotation-equivariant 3D steerable CNNs to extract spatial means, a temporal encoding to inject time, and a self-attention module to learn cross-time correspondences, with a diffeomorphic deformation correction for local distortions. A diffeomorphic deformation correction module and a joint loss integrate image similarity with geometric regularization, enabling stable training and fast inference. On 4D EPIs from 15 subjects (240 sequences), SpaER achieves translation error $= 3.81$ mm and angular error $= 2.76$ degrees with per-pair and per-sequence runtimes of $0.501$ s and $9.960$ s, outperforming baselines such as DeepPose, KeyMorph, and Equivariant Filters, demonstrating potential for real-time motion tracking and prospective motion correction without data augmentation. These results support real-time fetal head motion tracking and prospective correction in fetal MRI.
Abstract
In this paper, we introduce SpaER, a pioneering method for fetal motion tracking that leverages equivariant filters and self-attention mechanisms to effectively learn spatio-temporal representations. Different from conventional approaches that statically estimate fetal brain motions from pairs of images, our method dynamically tracks the rigid movement patterns of the fetal head across temporal and spatial dimensions. Specifically, we first develop an equivariant neural network that efficiently learns rigid motion sequences through low-dimensional spatial representations of images. Subsequently, we learn spatio-temporal representations by incorporating time encoding and self-attention neural network layers. This approach allows for the capture of long-term dependencies of fetal brain motion and addresses alignment errors due to contrast changes and severe motion artifacts. Our model also provides a geometric deformation estimation that properly addresses image distortions among all time frames. To the best of our knowledge, our approach is the first to learn spatial-temporal representations via deep neural networks for fetal motion tracking without data augmentation. We validated our model using real fetal echo-planar images with simulated and real motions. Our method carries significant potential value in accurately measuring, tracking, and correcting fetal motion in fetal MRI sequences.
