Table of Contents
Fetching ...

GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations

Zixuan Sun, Shuaifeng Zhi, Ruize Li, Jingyuan Xia, Yongxiang Liu, Weidong Jiang

TL;DR

GDROS tackles cross-modal optical-SAR image registration under large geometric transformations by predicting dense pixel-wise flow with a cross-modal CNN-Transformer fusion, followed by a differentiable least-squares refinement that enforces a 6-DoF affine constraint. The cross-attention-only fusion mitigates NRD between modalities, while the LSR module solves for the affine parameters $\\boldsymbol{\\Phi}$ via a least-squares formulation $\\boldsymbol{\\Phi} = (\\mathbf{X}^T\\mathbf{X})^{-1}\\mathbf{X}^T\\mathbf{F} +\\mathbf{I}$ and refines the flow accordingly, with training driven by $L_{seq}$ and $L_{geo}$. Evaluations on WHU-OPT-SAR, OS, and UBCv2 show GDROS achieves superior registration accuracy and robustness across resolutions, outperforming state-of-the-art baselines and demonstrating practical potential for multimodal fusion and navigation tasks. The framework is end-to-end trainable, efficient, and scalable to large images, and its code will be released to enable broader adoption in cross-modal geospatial analysis.

Abstract

Registration of optical and synthetic aperture radar (SAR) remote sensing images serves as a critical foundation for image fusion and visual navigation tasks. This task is particularly challenging because of their modal discrepancy, primarily manifested as severe nonlinear radiometric differences (NRD), geometric distortions, and noise variations. Under large geometric transformations, existing classical template-based and sparse keypoint-based strategies struggle to achieve reliable registration results for optical-SAR image pairs. To address these limitations, we propose GDROS, a geometry-guided dense registration framework leveraging global cross-modal image interactions. First, we extract cross-modal deep features from optical and SAR images through a CNN-Transformer hybrid feature extraction module, upon which a multi-scale 4D correlation volume is constructed and iteratively refined to establish pixel-wise dense correspondences. Subsequently, we implement a least squares regression (LSR) module to geometrically constrain the predicted dense optical flow field. Such geometry guidance mitigates prediction divergence by directly imposing an estimated affine transformation on the final flow predictions. Extensive experiments have been conducted on three representative datasets WHU-Opt-SAR dataset, OS dataset, and UBCv2 dataset with different spatial resolutions, demonstrating robust performance of our proposed method across different imaging resolutions. Qualitative and quantitative results show that GDROS significantly outperforms current state-of-the-art methods in all metrics. Our source code will be released at: https://github.com/Zi-Xuan-Sun/GDROS.

GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations

TL;DR

GDROS tackles cross-modal optical-SAR image registration under large geometric transformations by predicting dense pixel-wise flow with a cross-modal CNN-Transformer fusion, followed by a differentiable least-squares refinement that enforces a 6-DoF affine constraint. The cross-attention-only fusion mitigates NRD between modalities, while the LSR module solves for the affine parameters via a least-squares formulation and refines the flow accordingly, with training driven by and . Evaluations on WHU-OPT-SAR, OS, and UBCv2 show GDROS achieves superior registration accuracy and robustness across resolutions, outperforming state-of-the-art baselines and demonstrating practical potential for multimodal fusion and navigation tasks. The framework is end-to-end trainable, efficient, and scalable to large images, and its code will be released to enable broader adoption in cross-modal geospatial analysis.

Abstract

Registration of optical and synthetic aperture radar (SAR) remote sensing images serves as a critical foundation for image fusion and visual navigation tasks. This task is particularly challenging because of their modal discrepancy, primarily manifested as severe nonlinear radiometric differences (NRD), geometric distortions, and noise variations. Under large geometric transformations, existing classical template-based and sparse keypoint-based strategies struggle to achieve reliable registration results for optical-SAR image pairs. To address these limitations, we propose GDROS, a geometry-guided dense registration framework leveraging global cross-modal image interactions. First, we extract cross-modal deep features from optical and SAR images through a CNN-Transformer hybrid feature extraction module, upon which a multi-scale 4D correlation volume is constructed and iteratively refined to establish pixel-wise dense correspondences. Subsequently, we implement a least squares regression (LSR) module to geometrically constrain the predicted dense optical flow field. Such geometry guidance mitigates prediction divergence by directly imposing an estimated affine transformation on the final flow predictions. Extensive experiments have been conducted on three representative datasets WHU-Opt-SAR dataset, OS dataset, and UBCv2 dataset with different spatial resolutions, demonstrating robust performance of our proposed method across different imaging resolutions. Qualitative and quantitative results show that GDROS significantly outperforms current state-of-the-art methods in all metrics. Our source code will be released at: https://github.com/Zi-Xuan-Sun/GDROS.

Paper Structure

This paper contains 16 sections, 17 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of learning-based optical-SAR image registration (OSIR) frameworks. (a) Predicting motion offsets of four fixed reference points to solve homography/affine matrix, typically employing an encoder-only network architecture. (b) Describing non-rigid transformations via dense optical flow, typically employing an encoder-decoder network architecture. (c) Predicting sparse(semi-dense) keypoints correspondences, filtering mismatches, and finally estimating a homography/affine matrix via geometric rectification. (d) Our proposed solution GDROS: integrating cross-modal dense optical flow with geometric constraints to achieve geometry-guided dense registration.
  • Figure 2: Framework of our method GDROS. The input optical-SAR image pairs undergo attention mechanism-enabled feature extraction to obtain two distinct deep feature spaces, ${F}"_{opt\leftarrow SAR}$ and ${F}"_{SAR\leftarrow opt}$, with enhanced inter-modal information interaction, as depicted in the green-highlighted region. By leveraging these deep feature spaces, we construct a multi-scale 4D feature pyramid that enables GRU-based iterative refinement to generate dense optical-to-SAR flow fields. Subsequently, in the LSR-based geometric consistency enforcement module (yellow-highlighted region), geometric consistency constraints are systematically applied to correct mismatches in the initial flow field, ultimately yielding an accurate radiometric transformation model.
  • Figure 3: Example of network input image pairs generation.
  • Figure 4: Registration results on the WHU-OPT-SAR dataset. The yellow line represents the ground truth registration result, and the red line represents the experimental registration result.
  • Figure 5: Registration results on the OS dataset. The yellow line represents the ground truth registration result, and the red line represents the experimental registration result.
  • ...and 4 more figures