Table of Contents
Fetching ...

Multi-modal deformable image registration using untrained neural networks

Quang Luong Nhat Nguyen, Ruiming Cao, Laura Waller

TL;DR

The paper tackles the challenge of versatile image registration that across rigid/deformable and single/multi-modal data without task-specific models or ground-truth training data. It proposes untrained coordinate-based networks as implicit priors, using a motion network to generate a dense displacement field and an image network to store content, with $I_{trans} = f_{im}((x,y) + f_{mo}(x,y;\theta_{mo}); \theta_{im})$ and an $L_2$ objective. Model capacity is controlled via hash-encoded multi-resolution features and a coarse-to-fine strategy that first prioritizes motion alignment before refining image content, enabling robust performance across 2D and 3D datasets. Across Zurich 2D and Abdomen MR-CT tasks, the method outperforms baselines on single- and multi-modal registration, including deformable cases, illustrating its potential as a general, data-agnostic registration framework with practical implications for multimodal imaging and analysis.

Abstract

Image registration techniques usually assume that the images to be registered are of a certain type (e.g. single- vs. multi-modal, 2D vs. 3D, rigid vs. deformable) and there lacks a general method that can work for data under all conditions. We propose a registration method that utilizes neural networks for image representation. Our method uses untrained networks with limited representation capacity as an implicit prior to guide for a good registration. Unlike previous approaches that are specialized for specific data types, our method handles both rigid and non-rigid, as well as single- and multi-modal registration, without requiring changes to the model or objective function. We have performed a comprehensive evaluation study using a variety of datasets and demonstrated promising performance.

Multi-modal deformable image registration using untrained neural networks

TL;DR

The paper tackles the challenge of versatile image registration that across rigid/deformable and single/multi-modal data without task-specific models or ground-truth training data. It proposes untrained coordinate-based networks as implicit priors, using a motion network to generate a dense displacement field and an image network to store content, with and an objective. Model capacity is controlled via hash-encoded multi-resolution features and a coarse-to-fine strategy that first prioritizes motion alignment before refining image content, enabling robust performance across 2D and 3D datasets. Across Zurich 2D and Abdomen MR-CT tasks, the method outperforms baselines on single- and multi-modal registration, including deformable cases, illustrating its potential as a general, data-agnostic registration framework with practical implications for multimodal imaging and analysis.

Abstract

Image registration techniques usually assume that the images to be registered are of a certain type (e.g. single- vs. multi-modal, 2D vs. 3D, rigid vs. deformable) and there lacks a general method that can work for data under all conditions. We propose a registration method that utilizes neural networks for image representation. Our method uses untrained networks with limited representation capacity as an implicit prior to guide for a good registration. Unlike previous approaches that are specialized for specific data types, our method handles both rigid and non-rigid, as well as single- and multi-modal registration, without requiring changes to the model or objective function. We have performed a comprehensive evaluation study using a variety of datasets and demonstrated promising performance.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Untrained neural network for image registration. (a) The model outputs two channels, each representing a different modality, and two time points, denoting fixed and transformed input images. The L2 loss between the reconstructed images and their corresponding ground truth images is then computed and used to update the networks' weights. For single-modal registration, the L2 loss is applied on one output channel while in the multi-modal case, the loss function is computed using different output channels corresponding to the image modalities. The hash embedding is always applied to the input coordinate, so we consider it part of the network and omit it for visibility. (b) The model can handle rigid and deformable registration by tuning the granularity value. (c) An example of the 3D multi-modal registration of MRI-CT data. The green contours are the liver and spleen segmentations annotated on the displayed image, and the red contour refers to the segmentation annotated on the other image modality and warped using our method.
  • Figure 2: The network reconstruction of fixed image after applying motion kernel (green channel) overlaid with the transformed image (red) in the multi-modal case. The top row does not use the coarse-to-fine process, and the two images are not properly registered. The bottom row with coarse-to-fine produces coarser reconstruction at the beginning but eventually aligns multi-modal images.
  • Figure 3: The network reconstruction of fixed image after applying motion kernel (green channel) overlaid with the transformed image (red). The left image is registered using an image network with full representation capacity, while the right image is registered with an image MLP with reduced capacity.