GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations
Zixuan Sun, Shuaifeng Zhi, Ruize Li, Jingyuan Xia, Yongxiang Liu, Weidong Jiang
TL;DR
GDROS tackles cross-modal optical-SAR image registration under large geometric transformations by predicting dense pixel-wise flow with a cross-modal CNN-Transformer fusion, followed by a differentiable least-squares refinement that enforces a 6-DoF affine constraint. The cross-attention-only fusion mitigates NRD between modalities, while the LSR module solves for the affine parameters $\\boldsymbol{\\Phi}$ via a least-squares formulation $\\boldsymbol{\\Phi} = (\\mathbf{X}^T\\mathbf{X})^{-1}\\mathbf{X}^T\\mathbf{F} +\\mathbf{I}$ and refines the flow accordingly, with training driven by $L_{seq}$ and $L_{geo}$. Evaluations on WHU-OPT-SAR, OS, and UBCv2 show GDROS achieves superior registration accuracy and robustness across resolutions, outperforming state-of-the-art baselines and demonstrating practical potential for multimodal fusion and navigation tasks. The framework is end-to-end trainable, efficient, and scalable to large images, and its code will be released to enable broader adoption in cross-modal geospatial analysis.
Abstract
Registration of optical and synthetic aperture radar (SAR) remote sensing images serves as a critical foundation for image fusion and visual navigation tasks. This task is particularly challenging because of their modal discrepancy, primarily manifested as severe nonlinear radiometric differences (NRD), geometric distortions, and noise variations. Under large geometric transformations, existing classical template-based and sparse keypoint-based strategies struggle to achieve reliable registration results for optical-SAR image pairs. To address these limitations, we propose GDROS, a geometry-guided dense registration framework leveraging global cross-modal image interactions. First, we extract cross-modal deep features from optical and SAR images through a CNN-Transformer hybrid feature extraction module, upon which a multi-scale 4D correlation volume is constructed and iteratively refined to establish pixel-wise dense correspondences. Subsequently, we implement a least squares regression (LSR) module to geometrically constrain the predicted dense optical flow field. Such geometry guidance mitigates prediction divergence by directly imposing an estimated affine transformation on the final flow predictions. Extensive experiments have been conducted on three representative datasets WHU-Opt-SAR dataset, OS dataset, and UBCv2 dataset with different spatial resolutions, demonstrating robust performance of our proposed method across different imaging resolutions. Qualitative and quantitative results show that GDROS significantly outperforms current state-of-the-art methods in all metrics. Our source code will be released at: https://github.com/Zi-Xuan-Sun/GDROS.
