Table of Contents
Fetching ...

SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking

Jian Wang, Razieh Faghihpirayesh, Polina Golland, Ali Gholipour

TL;DR

SpaER addresses fetal MRI motion artifacts by learning spatio-temporal equivariant representations to track rigid brain motion across time without data augmentation. It fuses rotation-equivariant 3D steerable CNNs to extract spatial means, a temporal encoding to inject time, and a self-attention module to learn cross-time correspondences, with a diffeomorphic deformation correction for local distortions. A diffeomorphic deformation correction module and a joint loss integrate image similarity with geometric regularization, enabling stable training and fast inference. On 4D EPIs from 15 subjects (240 sequences), SpaER achieves translation error $= 3.81$ mm and angular error $= 2.76$ degrees with per-pair and per-sequence runtimes of $0.501$ s and $9.960$ s, outperforming baselines such as DeepPose, KeyMorph, and Equivariant Filters, demonstrating potential for real-time motion tracking and prospective motion correction without data augmentation. These results support real-time fetal head motion tracking and prospective correction in fetal MRI.

Abstract

In this paper, we introduce SpaER, a pioneering method for fetal motion tracking that leverages equivariant filters and self-attention mechanisms to effectively learn spatio-temporal representations. Different from conventional approaches that statically estimate fetal brain motions from pairs of images, our method dynamically tracks the rigid movement patterns of the fetal head across temporal and spatial dimensions. Specifically, we first develop an equivariant neural network that efficiently learns rigid motion sequences through low-dimensional spatial representations of images. Subsequently, we learn spatio-temporal representations by incorporating time encoding and self-attention neural network layers. This approach allows for the capture of long-term dependencies of fetal brain motion and addresses alignment errors due to contrast changes and severe motion artifacts. Our model also provides a geometric deformation estimation that properly addresses image distortions among all time frames. To the best of our knowledge, our approach is the first to learn spatial-temporal representations via deep neural networks for fetal motion tracking without data augmentation. We validated our model using real fetal echo-planar images with simulated and real motions. Our method carries significant potential value in accurately measuring, tracking, and correcting fetal motion in fetal MRI sequences.

SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking

TL;DR

SpaER addresses fetal MRI motion artifacts by learning spatio-temporal equivariant representations to track rigid brain motion across time without data augmentation. It fuses rotation-equivariant 3D steerable CNNs to extract spatial means, a temporal encoding to inject time, and a self-attention module to learn cross-time correspondences, with a diffeomorphic deformation correction for local distortions. A diffeomorphic deformation correction module and a joint loss integrate image similarity with geometric regularization, enabling stable training and fast inference. On 4D EPIs from 15 subjects (240 sequences), SpaER achieves translation error mm and angular error degrees with per-pair and per-sequence runtimes of s and s, outperforming baselines such as DeepPose, KeyMorph, and Equivariant Filters, demonstrating potential for real-time motion tracking and prospective motion correction without data augmentation. These results support real-time fetal head motion tracking and prospective correction in fetal MRI.

Abstract

In this paper, we introduce SpaER, a pioneering method for fetal motion tracking that leverages equivariant filters and self-attention mechanisms to effectively learn spatio-temporal representations. Different from conventional approaches that statically estimate fetal brain motions from pairs of images, our method dynamically tracks the rigid movement patterns of the fetal head across temporal and spatial dimensions. Specifically, we first develop an equivariant neural network that efficiently learns rigid motion sequences through low-dimensional spatial representations of images. Subsequently, we learn spatio-temporal representations by incorporating time encoding and self-attention neural network layers. This approach allows for the capture of long-term dependencies of fetal brain motion and addresses alignment errors due to contrast changes and severe motion artifacts. Our model also provides a geometric deformation estimation that properly addresses image distortions among all time frames. To the best of our knowledge, our approach is the first to learn spatial-temporal representations via deep neural networks for fetal motion tracking without data augmentation. We validated our model using real fetal echo-planar images with simulated and real motions. Our method carries significant potential value in accurately measuring, tracking, and correcting fetal motion in fetal MRI sequences.
Paper Structure (9 sections, 7 equations, 3 figures, 1 table)

This paper contains 9 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: An illustration of the network architecture of our proposed spatio-temporal learning framework, SpaER. Left to right: input data, networks that encode both the temporal and spatial features to learn rigid motions, and the output aligned sequence with a joint geometric distortion correction module.
  • Figure 2: Two case studies (top and bottom half) serve as visualizations of motion tracking results, with the "target" fetal brains highlighted by red contours for all methods. Top to bottom for each case, target, motion-corrected results using our method, Equivariant filter moyer2021equivariant, KeyMorphevan2022keymorph and DeepPose salehi2018real.
  • Figure 3: Motion tracking performance in real fMRI across varying degrees and lengths of motion sequences ($T$). Small ($\mathcal{T}{max} = 10 \text{mm}$, $\mathcal{R}{max} = 5^{\circ}$) and large motions ($\mathcal{T}{max} = 30 \text{mm}$, $\mathcal{R}{max} = 20^{\circ}$) were evaluated. Report efficiency with average time consumption: 0.501s per pair / 9.960s per sequence when $T=20$.