Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation
Xinyu Bai, Feng Xu
TL;DR
This work tackles the slow sampling of diffusion models for SAR-to-optical image translation by introducing adversarial consistency distillation in a teacher-student diffusion framework. By integrating a discriminator and a consistency-based training objective, the method achieves one- to few-step sampling with high fidelity, outperforming GAN-based baselines and the prior diffusion teacher in PSNR, SSIM, and FID while delivering up to $131\times$ inference speedups. The approach is validated on SEN12 and GF3 datasets, demonstrating robust qualitative and quantitative gains and a flexible speed–quality trade-off suitable for real-time remote sensing applications. The results suggest a practical, robust path to real-time SAR-to-optical translation without sacrificing translation realism or detail.
Abstract
Synthetic Aperture Radar (SAR) provides all-weather, high-resolution imaging capabilities, but its unique imaging mechanism often requires expert interpretation, limiting its widespread applicability. Translating SAR images into more easily recognizable optical images using diffusion models helps address this challenge. However, diffusion models suffer from high latency due to numerous iterative inferences, while Generative Adversarial Networks (GANs) can achieve image translation with just a single iteration but often at the cost of image quality. To overcome these issues, we propose a new training framework for SAR-to-optical image translation that combines the strengths of both approaches. Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts. Additionally, our approach allows for a trade-off between quality and speed, providing flexibility based on application requirements. We conducted experiments on SEN12 and GF3 datasets, performing quantitative evaluations using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Frechet Inception Distance (FID), as well as calculating the inference latency. The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images, thus offering a robust and efficient solution for SAR-to-optical image translation.
