A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond

Junyu Chen; Yihao Liu; Shuwen Wei; Zhangxing Bian; Shalini Subramanian; Aaron Carass; Jerry L. Prince; Yong Du

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond

Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L. Prince, Yong Du

TL;DR

This paper presents a comprehensive overview of the most recent advancements in deep learning-based image registration, and dives into innovative network architectures, loss functions specific to registration, and methods for estimating registration uncertainty.

Abstract

Deep learning technologies have dramatically reshaped the field of medical image registration over the past decade. The initial developments, such as regression-based and U-Net-based networks, established the foundation for deep learning in image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regularizations, network architectures, and uncertainty estimation. These advancements have not only enriched the field of image registration but have also facilitated its application in a wide range of tasks, including atlas construction, multi-atlas segmentation, motion estimation, and 2D-3D registration. In this paper, we present a comprehensive overview of the most recent advancements in deep learning-based image registration. We begin with a concise introduction to the core concepts of deep learning-based image registration. Then, we delve into innovative network architectures, loss functions specific to registration, and methods for estimating registration uncertainty. Additionally, this paper explores appropriate evaluation metrics for assessing the performance of deep learning models in registration tasks. Finally, we highlight the practical applications of these novel techniques in medical imaging and discuss the future prospects of deep learning-based image registration.

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond

TL;DR

Abstract

Paper Structure (50 sections, 25 equations, 12 figures, 3 tables)

This paper contains 50 sections, 25 equations, 12 figures, 3 tables.

Introduction
Fundamentals of Learning-based Image Registration
Supervised vs. Unsupervised Learning
Paradigm for Learning-based Registration
Diffeomorphic Image Registration
Loss Functions
Supervised Learning
Unsupervised & Semi-supervised Learning
Similarity Measure
Deformation Regularizer
Auxiliary Anatomical Information
Network Architectures
Adversarial Learning
Contrastive Learning
Transformers
...and 35 more sections

Figures (12)

Figure 1: Statistics of the articles investigated in this survey paper. The left panel displays a histogram of the number of papers by year; the vast majority of the surveyed papers were proposed within the last five years. The right panel illustrates the sources of the investigated articles, demonstrating that our survey draws from sources associated with the field of medical image analysis.
Figure 2: Overview of learning-based image registration. The top panel depicts the common pipeline for supervised learning in medical image registration, which necessitates ground truth transformations. The bottom panel demonstrates the unsupervised learning pipeline, wherein the network learns to perform registration using only input images. The left panel presents the learning-based DIR pipeline, typically employing an encoder-decoder-style network architecture. The right panel exhibits the learning-based rigid/affine registration, which usually involves only an encoder.
Figure 3: Visual representation of adversarial learning in medical image registration, with $\pmb{m}$ and $\pmb{f}$ denoting the moving and fixed images, respectively. Here, $\pmb{\phi}$ represents the deformation field, $\pmb{p}$ indicates a pseudo-probability generated by the discriminator, and $\pmb{A}$ and $\pmb{B}$ correspond to two different modalities. Panel (a) demonstrates how adversarial learning serves as a metric for image similarity, as similarly employed in fan2018adversarial, yan2018adversarial, mahapatra2018deformablemahapatra2018joint, and mahapatra2020training. Panel (b) shows the application of adversarial learning to multi-modal image registration, synthesizing different image modalities into the same modality for registration, as seen in xu2020adversarial, wei2019synthesis, zhang2021learning, and han2022deformable.
Figure 4: Visual representation of contrastive learning in medical image registration, with $\pmb{m}$ and $\pmb{f}$, respectively, denoting the moving and fixed images, $\pmb{\phi}$ denoting the deformation field, and $\pmb{A}$ and $\pmb{B}$ corresponding to two different modalities. Panel (a) illustrates the application of contrastive learning as a similarity metric for comparing the deformed moving image to the fixed image, as seen in hu2019towards and dey2022contrareg. Panel (b) depicts the use of contrastive learning to transform images from different modalities into a unified feature representation, upon which the registration model operates, as similarly employed in pielawski2020comir, wetzer2023can, and casamitjana2021synth.
Figure 5: Graphical representation of Transformers used in medical image registration, with $\pmb{m}$ and $\pmb{f}$ indicating the moving and fixed images, respectively, and $\pmb{\phi}$ representing the deformation field. Panel (a) displays a Transformer-ConvNet hybrid architecture, where Transformers function as encoders for extracting features, coupled with a ConvNet decoder for generating the deformation field. A similar framework is adopted in chen2021vitvnetchen2022transmorphchen2023spr, and zhang2021learning. Panel (b) shows a design where a Transformer is sandwiched between ConvNets, with the initial ConvNet acting as a feature extractor and the Transformer applies self- or cross-attention between image features, and the subsequent ConvNet decoder producing the deformation field. This configuration is adopted in zhang2021learning, chen2022deformer, and song2022cross. Panel (c) illustrates the architecture where Transformers serve as both the encoder and decoder, a design adopted in shi2022xmorpher.
...and 7 more figures

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond

TL;DR

Abstract

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond

Authors

TL;DR

Abstract

Table of Contents

Figures (12)