Table of Contents
Fetching ...

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution

Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, Jinyuan Liu, Peng Wang, Yanning Zhang

TL;DR

This work emphasizes the infrared spectral distribution fidelity and proposes a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity, and proposes a two-stage prompt-learning optimization to guide the model in learning infrared HR characteristics from LR degradation.

Abstract

Image super-resolution (SR) is a classical yet still active low-level vision problem that aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts, serving as a key technique for image enhancement. Current approaches to address SR tasks, such as transformer-based and diffusion-based methods, are either dedicated to extracting RGB image features or assuming similar degradation patterns, neglecting the inherent modal disparities between infrared and visible images. When directly applied to infrared image SR tasks, these methods inevitably distort the infrared spectral distribution, compromising the machine perception in downstream tasks. In this work, we emphasize the infrared spectral distribution fidelity and propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity. Our approach captures high-pass subbands from multi-scale and multi-directional infrared spectral decomposition to recover infrared-degraded information through a gate architecture. The proposed Spectral Fidelity Loss regularizes the spectral frequency distribution during reconstruction, which ensures the preservation of both high- and low-frequency components and maintains the fidelity of infrared-specific features. We propose a two-stage prompt-learning optimization to guide the model in learning infrared HR characteristics from LR degradation. Extensive experiments demonstrate that our approach outperforms existing image SR models in both visual and perceptual tasks while notably enhancing machine perception in downstream tasks. Our code is available at https://github.com/hey-it-s-me/CoRPLE.

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution

TL;DR

This work emphasizes the infrared spectral distribution fidelity and proposes a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity, and proposes a two-stage prompt-learning optimization to guide the model in learning infrared HR characteristics from LR degradation.

Abstract

Image super-resolution (SR) is a classical yet still active low-level vision problem that aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts, serving as a key technique for image enhancement. Current approaches to address SR tasks, such as transformer-based and diffusion-based methods, are either dedicated to extracting RGB image features or assuming similar degradation patterns, neglecting the inherent modal disparities between infrared and visible images. When directly applied to infrared image SR tasks, these methods inevitably distort the infrared spectral distribution, compromising the machine perception in downstream tasks. In this work, we emphasize the infrared spectral distribution fidelity and propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity. Our approach captures high-pass subbands from multi-scale and multi-directional infrared spectral decomposition to recover infrared-degraded information through a gate architecture. The proposed Spectral Fidelity Loss regularizes the spectral frequency distribution during reconstruction, which ensures the preservation of both high- and low-frequency components and maintains the fidelity of infrared-specific features. We propose a two-stage prompt-learning optimization to guide the model in learning infrared HR characteristics from LR degradation. Extensive experiments demonstrate that our approach outperforms existing image SR models in both visual and perceptual tasks while notably enhancing machine perception in downstream tasks. Our code is available at https://github.com/hey-it-s-me/CoRPLE.

Paper Structure

This paper contains 26 sections, 10 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: Algorithmic core design and performance evaluation of our method. In (a), our approach efficiently reconstructs high-quality infrared images by leveraging the multi-scale and multi-directional Contourlet Transform to extract quality-sensitive, infrared-specific high-pass subbands. We further learn the infrared LR-HR mapping using degradation and spectral fidelity loss, improving the fusion results and machine perception. In (b), we present a statistical objective comparison of our proposed method across super-resolution, object detection, and semantic segmentation tasks, benchmarking it against two SOTA methods and demonstrating its superior performance in the infrared domain.
  • Figure 2: Overall architecture of our network. (a) The main pipeline consists of shallow feature extraction, deep feature extraction, and image reconstruction. The deep feature extraction contains the Spatial Attention Block (SAB), Channel Attention Block (CAB), and the Contourlet Refinement Gate (CRG). (b) The spatial Attention Block (SAB) and the Channel Attention Block (CAB). (c) The Contourlet Transform (CT). (d) The Global-Local Interactive Attention Block. (e) Prompt learning optimization.
  • Figure 3: Visual demonstration of the sparsity for (a). traditional Wavelet Transform and (b). Contourlet Transform.
  • Figure 4: Architecture of the Contourlet Transform, The input feature is first decomposed by an LP filter to low- and high-pass subbands. Then, the high-pass subbands are decomposed into $2^i$ directional subspaces through the DFB.
  • Figure 5: Overview of the prompt learning process. (a). The unlocked text encoder optimizes the learnable prompts to maximize the distance between negative and positive semantics in the latent space. (b). The degradation loss guides the SR process to align with positive prompts while distancing from the negative ones with the locked text encoder.
  • ...and 9 more figures