Table of Contents
Fetching ...

Disentangle-then-Align: Non-Iterative Hybrid Multimodal Image Registration via Cross-Scale Feature Disentanglement

Chunlei Zhang, Jiahao Xia, Yun Xiao, Bo Jiang, Jian Zhang

Abstract

Multimodal image registration is a fundamental task and a prerequisite for downstream cross-modal analysis. Despite recent progress in shared feature extraction and multi-scale architectures, two key limitations remain. First, some methods use disentanglement to learn shared features but mainly regularize the shared part, allowing modality-private cues to leak into the shared space. Second, most multi-scale frameworks support only a single transformation type, limiting their applicability when global misalignment and local deformation coexist. To address these issues, we formulate hybrid multimodal registration as jointly learning a stable shared feature space and a unified hybrid transformation. Based on this view, we propose HRNet, a Hybrid Registration Network that couples representation disentanglement with hybrid parameter prediction. A shared backbone with Modality-Specific Batch Normalization (MSBN) extracts multi-scale features, while a Cross-scale Disentanglement and Adaptive Projection (CDAP) module suppresses modality-private cues and projects shared features into a stable subspace for matching. Built on this shared space, a Hybrid Parameter Prediction Module (HPPM) performs non-iterative coarse-to-fine estimation of global rigid parameters and deformation fields, which are fused into a coherent deformation field. Extensive experiments on four multimodal datasets demonstrate state-of-the-art performance on rigid and non-rigid registration tasks. The code is available at the project website.

Disentangle-then-Align: Non-Iterative Hybrid Multimodal Image Registration via Cross-Scale Feature Disentanglement

Abstract

Multimodal image registration is a fundamental task and a prerequisite for downstream cross-modal analysis. Despite recent progress in shared feature extraction and multi-scale architectures, two key limitations remain. First, some methods use disentanglement to learn shared features but mainly regularize the shared part, allowing modality-private cues to leak into the shared space. Second, most multi-scale frameworks support only a single transformation type, limiting their applicability when global misalignment and local deformation coexist. To address these issues, we formulate hybrid multimodal registration as jointly learning a stable shared feature space and a unified hybrid transformation. Based on this view, we propose HRNet, a Hybrid Registration Network that couples representation disentanglement with hybrid parameter prediction. A shared backbone with Modality-Specific Batch Normalization (MSBN) extracts multi-scale features, while a Cross-scale Disentanglement and Adaptive Projection (CDAP) module suppresses modality-private cues and projects shared features into a stable subspace for matching. Built on this shared space, a Hybrid Parameter Prediction Module (HPPM) performs non-iterative coarse-to-fine estimation of global rigid parameters and deformation fields, which are fused into a coherent deformation field. Extensive experiments on four multimodal datasets demonstrate state-of-the-art performance on rigid and non-rigid registration tasks. The code is available at the project website.
Paper Structure (18 sections, 16 equations, 7 figures, 6 tables)

This paper contains 18 sections, 16 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison of model structures. (a) non-rigid only, (b) rigid only; (c) Our hybrid structure.
  • Figure 2: (a) Rigid registration fails to handle local deformations (first row). Non-rigid registration may distort structural integrity under large global offsets (second row). (b) Serial hybrid registration.
  • Figure 3: The schematic diagram and detailed architectures of the Hybrid Registration Network, namely HRNet, which consists of three main components: a shared backbone with Modality-Specific Batch Normalization (MSBN), a Cross-scale Disentanglement and Adaptive Projection (CDAP) module for feature disentanglement, and a Hybrid Parameter Prediction Module (HPPM) for transformation parameter estimation.
  • Figure 4: Qualitative comparison of rigid registration. F: fixed image, M: moving image. From top to bottom: RGB-SAR, RGB-IR, RGB-NIR, and RGB-TIR.
  • Figure 5: Qualitative comparison of non-rigid registration. F: fixed image, M: moving image. From top to bottom: RGB-SAR, RGB-IR, RGB-NIR, and RGB-TIR.
  • ...and 2 more figures