Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution
Xiaohui Sun, Jiangwei Mo, Hanlin Wu, Jie Ma
TL;DR
LCMSR addresses the slow inference of diffusion-based RSISR by introducing a two-stage latent framework that maps HR–LR differences into a compact latent code and enforces trajectory consistency in latent space. A residual autoencoder captures high-frequency details as $\bm z = \mathcal{E}(I_{\mathrm{HR}}, I_{\mathrm{LR}}^{\uparrow})$, and the SR result is produced as $I_{\mathrm{SR}} = \mathcal{D}(I_{\mathrm{LR}}, \bm z)$. The second stage employs a latent diffusion forward process and a consistency model that maps noisy latents to the latent start point in one step, conditioned on LR features via a CondNet and reinforced by a KD loss and a CT loss with EMA targets. The approach reduces diffusion steps from thousands to a single step while achieving competitive PSNR and superior FID/LPIPS compared with SOTA methods on AID and DIOR, enabling real-time-like RSISR performance without heavy diffusion sampling.
Abstract
Recent advancements in diffusion models (DMs) have greatly advanced remote sensing image super-resolution (RSISR). However, their iterative sampling processes often result in slow inference speeds, limiting their application in real-time tasks. To address this challenge, we propose the latent consistency model for super-resolution (LCMSR), a novel single-step diffusion approach designed to enhance both efficiency and visual quality in RSISR tasks. Our proposal is structured into two distinct stages. In the first stage, we pretrain a residual autoencoder to encode the differential information between high-resolution (HR) and low-resolution (LR) images, transitioning the diffusion process into a latent space to reduce computational costs. The second stage focuses on consistency diffusion learning, which aims to learn the distribution of residual encodings in the latent space, conditioned on LR images. The consistency constraint enforces that predictions at any two timesteps along the reverse diffusion trajectory remain consistent, enabling direct mapping from noise to data. As a result, the proposed LCMSR reduces the iterative steps of traditional diffusion models from 50-1000 or more to just a single step, significantly improving efficiency. Experimental results demonstrate that LCMSR effectively balances efficiency and performance, achieving inference times comparable to non-diffusion models while maintaining high-quality output.
