Table of Contents
Fetching ...

SwinFuSR: an image fusion-inspired model for RGB-guided thermal image super-resolution

Cyprien Arnold, Philippe Jouvet, Lama Seoud

TL;DR

SwinFuSR tackles the challenge of enhancing infrared image resolution by leveraging high-resolution RGB guidance through a lightweight Swin Transformer–based architecture. The method comprises three modules that separately extract shallow features, fuse RGB/IR information with Attention-guided Cross-domain Fusion blocks, and reconstruct the HR IR image, with a skip path from the upsampled IR to aid learning. A novel training strategy randomly removes RGB guidance during training to improve robustness when the guiding modality is missing, while a two-stage loss (L1 then L2) stabilizes training. Results on the PBVS 2024 Track 2 dataset show state-of-the-art PSNR/SSIM and favorable parameter efficiency; the approach also demonstrates improved unguided SR when RGB is unavailable, highlighting practical robustness for real-world multimodal SR tasks.

Abstract

Thermal imaging plays a crucial role in various applications, but the inherent low resolution of commonly available infrared (IR) cameras limits its effectiveness. Conventional super-resolution (SR) methods often struggle with thermal images due to their lack of high-frequency details. Guided SR leverages information from a high-resolution image, typically in the visible spectrum, to enhance the reconstruction of a high-res IR image from the low-res input. Inspired by SwinFusion, we propose SwinFuSR, a guided SR architecture based on Swin transformers. In real world scenarios, however, the guiding modality (e.g. RBG image) may be missing, so we propose a training method that improves the robustness of the model in this case. Our method has few parameters and outperforms state of the art models in terms of Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM). In Track 2 of the PBVS 2024 Thermal Image Super-Resolution Challenge, it achieves 3rd place in the PSNR metric. Our code and pretained weights are available at https://github.com/VisionICLab/SwinFuSR.

SwinFuSR: an image fusion-inspired model for RGB-guided thermal image super-resolution

TL;DR

SwinFuSR tackles the challenge of enhancing infrared image resolution by leveraging high-resolution RGB guidance through a lightweight Swin Transformer–based architecture. The method comprises three modules that separately extract shallow features, fuse RGB/IR information with Attention-guided Cross-domain Fusion blocks, and reconstruct the HR IR image, with a skip path from the upsampled IR to aid learning. A novel training strategy randomly removes RGB guidance during training to improve robustness when the guiding modality is missing, while a two-stage loss (L1 then L2) stabilizes training. Results on the PBVS 2024 Track 2 dataset show state-of-the-art PSNR/SSIM and favorable parameter efficiency; the approach also demonstrates improved unguided SR when RGB is unavailable, highlighting practical robustness for real-world multimodal SR tasks.

Abstract

Thermal imaging plays a crucial role in various applications, but the inherent low resolution of commonly available infrared (IR) cameras limits its effectiveness. Conventional super-resolution (SR) methods often struggle with thermal images due to their lack of high-frequency details. Guided SR leverages information from a high-resolution image, typically in the visible spectrum, to enhance the reconstruction of a high-res IR image from the low-res input. Inspired by SwinFusion, we propose SwinFuSR, a guided SR architecture based on Swin transformers. In real world scenarios, however, the guiding modality (e.g. RBG image) may be missing, so we propose a training method that improves the robustness of the model in this case. Our method has few parameters and outperforms state of the art models in terms of Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM). In Track 2 of the PBVS 2024 Thermal Image Super-Resolution Challenge, it achieves 3rd place in the PSNR metric. Our code and pretained weights are available at https://github.com/VisionICLab/SwinFuSR.
Paper Structure (22 sections, 2 equations, 7 figures, 1 table)

This paper contains 22 sections, 2 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Architecture of the proposed SwinFuSR model.
  • Figure 2: Effect of module depth on overall performance.
  • Figure 3: Performance with (blue) and without skip connection (green).
  • Figure 4: GTISR on image 292_01_D4 from PBVS 2024 Track- dataset.
  • Figure 5: GTISR on sample image from SLP dataset Liu_2019.
  • ...and 2 more figures