DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution
Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, Jinyuan Liu
TL;DR
DifIISR addresses infrared image super-resolution by integrating gradient-based guidance into diffusion sampling, aligning reconstruction with both visual frequency characteristics and machine-perception needs. The method introduces visual spectral distribution regulation via FFT-based magnitude matching and perceptual guidance using VGG features and SAM-based segmentation losses, formalized as a combined loss gradient injected into the denoiser's noise at each step. Empirical results on infrared datasets show superior perceptual metrics and improved downstream detection and segmentation performance, with ablations confirming the effectiveness of dual guidance and gradient-based optimization. This yields sharper infrared reconstructions that are more actionable for perception tasks, offering practical benefits for autonomous driving, robotics, and related systems; code is released for reproducibility.
Abstract
Infrared imaging is essential for autonomous driving and robotic operations as a supportive modality due to its reliable performance in challenging environments. Despite its popularity, the limitations of infrared cameras, such as low spatial resolution and complex degradations, consistently challenge imaging quality and subsequent visual tasks. Hence, infrared image super-resolution (IISR) has been developed to address this challenge. While recent developments in diffusion models have greatly advanced this field, current methods to solve it either ignore the unique modal characteristics of infrared imaging or overlook the machine perception requirements. To bridge these gaps, we propose DifIISR, an infrared image super-resolution diffusion model optimized for visual quality and perceptual performance. Our approach achieves task-based guidance for diffusion by injecting gradients derived from visual and perceptual priors into the noise during the reverse process. Specifically, we introduce an infrared thermal spectrum distribution regulation to preserve visual fidelity, ensuring that the reconstructed infrared images closely align with high-resolution images by matching their frequency components. Subsequently, we incorporate various visual foundational models as the perceptual guidance for downstream visual tasks, infusing generalizable perceptual features beneficial for detection and segmentation. As a result, our approach gains superior visual results while attaining State-Of-The-Art downstream task performance. Code is available at https://github.com/zirui0625/DifIISR
