Table of Contents
Fetching ...

Unpaired Object-Level SAR-to-Optical Image Translation for Aircraft with Keypoints-Guided Diffusion Models

Ruixi You, Hecheng Jia, Feng Xu

TL;DR

The paper tackles the challenging problem of object-level SAR-to-optical translation for aircraft using unpaired data. It introduces KeypointDiff, a diffusion-based framework built on classifier-free guidance, augmented with a Class-Angle Guidance Module (CAGM) and a keypoint-guided training/testing strategy to preserve contours and textures while enabling automated inference. The approach leverages pseudo-pairing, keypoint supervision, and specialized losses (color, perceptual, and adversarial) to achieve high-fidelity optical reconstructions and accurate target attributes, with strong zero-shot generalization. Empirical results show clear improvements over baselines in both image quality (FID) and aircraft-specific metrics (OA, Angle Error), indicating practical potential for interpretable SAR analysis and downstream tasks across untrained aircraft categories.

Abstract

Synthetic Aperture Radar (SAR) imagery provides all-weather, all-day, and high-resolution imaging capabilities but its unique imaging mechanism makes interpretation heavily reliant on expert knowledge, limiting interpretability, especially in complex target tasks. Translating SAR images into optical images is a promising solution to enhance interpretation and support downstream tasks. Most existing research focuses on scene-level translation, with limited work on object-level translation due to the scarcity of paired data and the challenge of accurately preserving contour and texture details. To address these issues, this study proposes a keypoint-guided diffusion model (KeypointDiff) for SAR-to-optical image translation of unpaired aircraft targets. This framework introduces supervision on target class and azimuth angle via keypoints, along with a training strategy for unpaired data. Based on the classifier-free guidance diffusion architecture, a class-angle guidance module (CAGM) is designed to integrate class and angle information into the diffusion generation process. Furthermore, adversarial loss and consistency loss are employed to improve image fidelity and detail quality, tailored for aircraft targets. During sampling, aided by a pre-trained keypoint detector, the model eliminates the requirement for manually labeled class and azimuth information, enabling automated SAR-to-optical translation. Experimental results demonstrate that the proposed method outperforms existing approaches across multiple metrics, providing an efficient and effective solution for object-level SAR-to-optical translation and downstream tasks. Moreover, the method exhibits strong zero-shot generalization to untrained aircraft types with the assistance of the keypoint detector.

Unpaired Object-Level SAR-to-Optical Image Translation for Aircraft with Keypoints-Guided Diffusion Models

TL;DR

The paper tackles the challenging problem of object-level SAR-to-optical translation for aircraft using unpaired data. It introduces KeypointDiff, a diffusion-based framework built on classifier-free guidance, augmented with a Class-Angle Guidance Module (CAGM) and a keypoint-guided training/testing strategy to preserve contours and textures while enabling automated inference. The approach leverages pseudo-pairing, keypoint supervision, and specialized losses (color, perceptual, and adversarial) to achieve high-fidelity optical reconstructions and accurate target attributes, with strong zero-shot generalization. Empirical results show clear improvements over baselines in both image quality (FID) and aircraft-specific metrics (OA, Angle Error), indicating practical potential for interpretable SAR analysis and downstream tasks across untrained aircraft categories.

Abstract

Synthetic Aperture Radar (SAR) imagery provides all-weather, all-day, and high-resolution imaging capabilities but its unique imaging mechanism makes interpretation heavily reliant on expert knowledge, limiting interpretability, especially in complex target tasks. Translating SAR images into optical images is a promising solution to enhance interpretation and support downstream tasks. Most existing research focuses on scene-level translation, with limited work on object-level translation due to the scarcity of paired data and the challenge of accurately preserving contour and texture details. To address these issues, this study proposes a keypoint-guided diffusion model (KeypointDiff) for SAR-to-optical image translation of unpaired aircraft targets. This framework introduces supervision on target class and azimuth angle via keypoints, along with a training strategy for unpaired data. Based on the classifier-free guidance diffusion architecture, a class-angle guidance module (CAGM) is designed to integrate class and angle information into the diffusion generation process. Furthermore, adversarial loss and consistency loss are employed to improve image fidelity and detail quality, tailored for aircraft targets. During sampling, aided by a pre-trained keypoint detector, the model eliminates the requirement for manually labeled class and azimuth information, enabling automated SAR-to-optical translation. Experimental results demonstrate that the proposed method outperforms existing approaches across multiple metrics, providing an efficient and effective solution for object-level SAR-to-optical translation and downstream tasks. Moreover, the method exhibits strong zero-shot generalization to untrained aircraft types with the assistance of the keypoint detector.

Paper Structure

This paper contains 37 sections, 27 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Illustrating the comparison between region-level and object-level SAR-to-optical image translation. Object-level translation faces challenges including unpaired training data and the demand for accurate recovery of contours and texture details.
  • Figure 2: Overview of the KeypointDiff framework, illustrating the core denoising U-Net architecture, the integration of customized loss functions , and the keypoints-based supervision strategy for both training and testing phases.
  • Figure 3: Illustrating the forward diffusion process and reverse sampling process. During training, the angle alignment for unpaired training is followed by the forward diffusion process, where the clear image $x_0$ is gradually diffused by adding the Gaussian noise. In the reverse sampling process, the clear image $x_0$ is iteratively denoised from an isotropic Gaussian distribution.
  • Figure 4: Structure of the aircraft feature detector predicting aircraft keypoints and the category.
  • Figure 5: Structure of the CAGM module and its sub-blocks, forming the core component of the denoising U-Net in the KeypointDiff framework.
  • ...and 6 more figures