Table of Contents
Fetching ...

SOMA: Feature Gradient Enhanced Affine-Flow Matching for SAR-Optical Registration

Haodong Wang, Tao Zhuo, Xiuwei Zhang, Hanlin Yin, Wencong Wu, Yanning Zhang

TL;DR

This work targets pixel-level registration between SAR and optical imagery, a challenging cross-modal task due to fundamental imaging differences. It introduces SOMA, a dense registration framework that fuses a Feature Gradient Enhancer for gradient-informed features with a Global-Local Affine-Flow Matcher for coarse-to-fine alignment, aided by a frozen DINOv2 backbone. The key contributions are the gradient-based feature enhancement (FGE), the coupled affine-flow matcher (GLAM), and extensive ablations showing significant gains in CMR@1px across diverse datasets, along with solid generalization and efficient runtime. The approach enables robust, precise SAR-Optical registration suitable for multi-source data fusion under varying scenes and resolutions, advancing practical cross-modal registration tasks.

Abstract

Achieving pixel-level registration between SAR and optical images remains a challenging task due to their fundamentally different imaging mechanisms and visual characteristics. Although deep learning has achieved great success in many cross-modal tasks, its performance on SAR-Optical registration tasks is still unsatisfactory. Gradient-based information has traditionally played a crucial role in handcrafted descriptors by highlighting structural differences. However, such gradient cues have not been effectively leveraged in deep learning frameworks for SAR-Optical image matching. To address this gap, we propose SOMA, a dense registration framework that integrates structural gradient priors into deep features and refines alignment through a hybrid matching strategy. Specifically, we introduce the Feature Gradient Enhancer (FGE), which embeds multi-scale, multi-directional gradient filters into the feature space using attention and reconstruction mechanisms to boost feature distinctiveness. Furthermore, we propose the Global-Local Affine-Flow Matcher (GLAM), which combines affine transformation and flow-based refinement within a coarse-to-fine architecture to ensure both structural consistency and local accuracy. Experimental results demonstrate that SOMA significantly improves registration precision, increasing the CMR@1px by 12.29% on the SEN1-2 dataset and 18.50% on the GFGE_SO dataset. In addition, SOMA exhibits strong robustness and generalizes well across diverse scenes and resolutions.

SOMA: Feature Gradient Enhanced Affine-Flow Matching for SAR-Optical Registration

TL;DR

This work targets pixel-level registration between SAR and optical imagery, a challenging cross-modal task due to fundamental imaging differences. It introduces SOMA, a dense registration framework that fuses a Feature Gradient Enhancer for gradient-informed features with a Global-Local Affine-Flow Matcher for coarse-to-fine alignment, aided by a frozen DINOv2 backbone. The key contributions are the gradient-based feature enhancement (FGE), the coupled affine-flow matcher (GLAM), and extensive ablations showing significant gains in CMR@1px across diverse datasets, along with solid generalization and efficient runtime. The approach enables robust, precise SAR-Optical registration suitable for multi-source data fusion under varying scenes and resolutions, advancing practical cross-modal registration tasks.

Abstract

Achieving pixel-level registration between SAR and optical images remains a challenging task due to their fundamentally different imaging mechanisms and visual characteristics. Although deep learning has achieved great success in many cross-modal tasks, its performance on SAR-Optical registration tasks is still unsatisfactory. Gradient-based information has traditionally played a crucial role in handcrafted descriptors by highlighting structural differences. However, such gradient cues have not been effectively leveraged in deep learning frameworks for SAR-Optical image matching. To address this gap, we propose SOMA, a dense registration framework that integrates structural gradient priors into deep features and refines alignment through a hybrid matching strategy. Specifically, we introduce the Feature Gradient Enhancer (FGE), which embeds multi-scale, multi-directional gradient filters into the feature space using attention and reconstruction mechanisms to boost feature distinctiveness. Furthermore, we propose the Global-Local Affine-Flow Matcher (GLAM), which combines affine transformation and flow-based refinement within a coarse-to-fine architecture to ensure both structural consistency and local accuracy. Experimental results demonstrate that SOMA significantly improves registration precision, increasing the CMR@1px by 12.29% on the SEN1-2 dataset and 18.50% on the GFGE_SO dataset. In addition, SOMA exhibits strong robustness and generalizes well across diverse scenes and resolutions.

Paper Structure

This paper contains 31 sections, 13 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison of CMR@1px among methods leveraging different types of feature representations. The proposed SOMA, which incorporates feature gradient enhancement, significantly outperforms both the conventional CNN baseline and the CFOG that leverages image gradients.
  • Figure 2: (a) Flow fields prioritize local warp, but leave distortion. Affine fields maintain structural consistency via global transforms but lost local precision. (b) Violin plots further reveal distinct error profiles.
  • Figure 3: SOMA framework with FGE and GLAM. Given a SAR-Optical image pair, SOMA extracts multi-scale features by two separated ResNet50 encoders and a frozen DINOv2 branch, then enhances representations with Feature Gradient Enhancer (FGE), and performs hierarchical alignment using Global-Local Affine-Flow Matcher (GLAM). All convolution layers use a kernel size of $3\times3$ by default, unless stated otherwise.
  • Figure 4: