Table of Contents
Fetching ...

Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation

Xinyu Bai, Feng Xu

TL;DR

This work tackles the slow sampling of diffusion models for SAR-to-optical image translation by introducing adversarial consistency distillation in a teacher-student diffusion framework. By integrating a discriminator and a consistency-based training objective, the method achieves one- to few-step sampling with high fidelity, outperforming GAN-based baselines and the prior diffusion teacher in PSNR, SSIM, and FID while delivering up to $131\times$ inference speedups. The approach is validated on SEN12 and GF3 datasets, demonstrating robust qualitative and quantitative gains and a flexible speed–quality trade-off suitable for real-time remote sensing applications. The results suggest a practical, robust path to real-time SAR-to-optical translation without sacrificing translation realism or detail.

Abstract

Synthetic Aperture Radar (SAR) provides all-weather, high-resolution imaging capabilities, but its unique imaging mechanism often requires expert interpretation, limiting its widespread applicability. Translating SAR images into more easily recognizable optical images using diffusion models helps address this challenge. However, diffusion models suffer from high latency due to numerous iterative inferences, while Generative Adversarial Networks (GANs) can achieve image translation with just a single iteration but often at the cost of image quality. To overcome these issues, we propose a new training framework for SAR-to-optical image translation that combines the strengths of both approaches. Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts. Additionally, our approach allows for a trade-off between quality and speed, providing flexibility based on application requirements. We conducted experiments on SEN12 and GF3 datasets, performing quantitative evaluations using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Frechet Inception Distance (FID), as well as calculating the inference latency. The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images, thus offering a robust and efficient solution for SAR-to-optical image translation.

Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation

TL;DR

This work tackles the slow sampling of diffusion models for SAR-to-optical image translation by introducing adversarial consistency distillation in a teacher-student diffusion framework. By integrating a discriminator and a consistency-based training objective, the method achieves one- to few-step sampling with high fidelity, outperforming GAN-based baselines and the prior diffusion teacher in PSNR, SSIM, and FID while delivering up to inference speedups. The approach is validated on SEN12 and GF3 datasets, demonstrating robust qualitative and quantitative gains and a flexible speed–quality trade-off suitable for real-time remote sensing applications. The results suggest a practical, robust path to real-time SAR-to-optical translation without sacrificing translation realism or detail.

Abstract

Synthetic Aperture Radar (SAR) provides all-weather, high-resolution imaging capabilities, but its unique imaging mechanism often requires expert interpretation, limiting its widespread applicability. Translating SAR images into more easily recognizable optical images using diffusion models helps address this challenge. However, diffusion models suffer from high latency due to numerous iterative inferences, while Generative Adversarial Networks (GANs) can achieve image translation with just a single iteration but often at the cost of image quality. To overcome these issues, we propose a new training framework for SAR-to-optical image translation that combines the strengths of both approaches. Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts. Additionally, our approach allows for a trade-off between quality and speed, providing flexibility based on application requirements. We conducted experiments on SEN12 and GF3 datasets, performing quantitative evaluations using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Frechet Inception Distance (FID), as well as calculating the inference latency. The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images, thus offering a robust and efficient solution for SAR-to-optical image translation.
Paper Structure (26 sections, 11 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 11 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of different sampling methods for SAR-to-optical image translation. Columns from left to right: (a) SAR input images, (b) ground truth (GT) optical images, (c) results using the DDIM sampler, (d) results using the DPM++ sampler, and (e) results using the original sampler. The original method exhibits color shift, while the accelerated DDIM and DPM++ samplers exacerbate this issue, producing images that are often blurry and exhibit noticeable artifacts. This highlights the need for improved sampling methods to enhance image quality.
  • Figure 2: Illustration of the diffusion process: the original optical image is progressively corrupted with Gaussian noise in the forward process and then denoised in the reverse process to reconstruct the original image. The arrows indicate the forward (right) and backward (left) processes.
  • Figure 3: The architecture of the teacher and student models based on the U-net network for noise prediction, used in the forward and backward diffusion processes for SAR-to-optical image translation.
  • Figure 4: The proposed training process involves a student model, a teacher model with fixed weights, and a trainable discriminator. The integration of adversarial learning significantly enhances the quality of SAR-to-optical image translation.
  • Figure 5: The relationship between the number of inference iterations and image quality, measured using (a) PSNR and (b) SSIM. The plots show that while low iteration counts result in poor image quality, increasing the iterations improves quality to a point beyond which additional iterations offer minimal benefits.
  • ...and 3 more figures