Table of Contents
Fetching ...

A Lightweight Image Super-Resolution Transformer Trained on Low-Resolution Images Only

Björn Möller, Lucas Görnhardt, Tim Fingscheidt

TL;DR

This work addresses single-image super-resolution under the LR-only training constraint, where HR training data are unavailable. It introduces MSTbic, a multi-scale scale-augmentation method that generates pseudo-LR/HR pairs from LR images to supervise training of a lightweight SwinIR transformer (0.89M parameters). By adapting a microscopy-based LR-only training approach to macroscopic data, MSTbic yields state-of-the-art results on standard LR-only SR benchmarks across multiple datasets, for both transformer and CNN backbones. The findings demonstrate that transformer-based SR can be effectively trained with LR data alone, expanding practical applicability in real-world scenarios with scarce HR training resources.

Abstract

Transformer architectures prominently lead single-image super-resolution (SISR) benchmarks, reconstructing high-resolution (HR) images from their low-resolution (LR) counterparts. Their strong representative power, however, comes with a higher demand for training data compared to convolutional neural networks (CNNs). For many real-world SR applications, the availability of high-quality HR training images is not given, sparking interest in LR-only training methods. The LR-only SISR benchmark mimics this condition by allowing only low-resolution (LR) images for model training. For a 4x super-resolution, this effectively reduces the amount of available training data to 6.25% of the HR image pixels, which puts the employment of a data-hungry transformer model into question. In this work, we are the first to utilize a lightweight vision transformer model with LR-only training methods addressing the unsupervised SISR LR-only benchmark. We adopt and configure a recent LR-only training method from microscopy image super-resolution to macroscopic real-world data, resulting in our multi-scale training method for bicubic degradation (MSTbic). Furthermore, we compare it with reference methods and prove its effectiveness both for a transformer and a CNN model. We evaluate on the classic SR benchmark datasets Set5, Set14, BSD100, Urban100, and Manga109, and show superior performance over state-of-the-art (so far: CNN-based) LR-only SISR methods. The code is available on GitHub: https://github.com/ifnspaml/SuperResolutionMultiscaleTraining.

A Lightweight Image Super-Resolution Transformer Trained on Low-Resolution Images Only

TL;DR

This work addresses single-image super-resolution under the LR-only training constraint, where HR training data are unavailable. It introduces MSTbic, a multi-scale scale-augmentation method that generates pseudo-LR/HR pairs from LR images to supervise training of a lightweight SwinIR transformer (0.89M parameters). By adapting a microscopy-based LR-only training approach to macroscopic data, MSTbic yields state-of-the-art results on standard LR-only SR benchmarks across multiple datasets, for both transformer and CNN backbones. The findings demonstrate that transformer-based SR can be effectively trained with LR data alone, expanding practical applicability in real-world scenarios with scarce HR training resources.

Abstract

Transformer architectures prominently lead single-image super-resolution (SISR) benchmarks, reconstructing high-resolution (HR) images from their low-resolution (LR) counterparts. Their strong representative power, however, comes with a higher demand for training data compared to convolutional neural networks (CNNs). For many real-world SR applications, the availability of high-quality HR training images is not given, sparking interest in LR-only training methods. The LR-only SISR benchmark mimics this condition by allowing only low-resolution (LR) images for model training. For a 4x super-resolution, this effectively reduces the amount of available training data to 6.25% of the HR image pixels, which puts the employment of a data-hungry transformer model into question. In this work, we are the first to utilize a lightweight vision transformer model with LR-only training methods addressing the unsupervised SISR LR-only benchmark. We adopt and configure a recent LR-only training method from microscopy image super-resolution to macroscopic real-world data, resulting in our multi-scale training method for bicubic degradation (MSTbic). Furthermore, we compare it with reference methods and prove its effectiveness both for a transformer and a CNN model. We evaluate on the classic SR benchmark datasets Set5, Set14, BSD100, Urban100, and Manga109, and show superior performance over state-of-the-art (so far: CNN-based) LR-only SISR methods. The code is available on GitHub: https://github.com/ifnspaml/SuperResolutionMultiscaleTraining.

Paper Structure

This paper contains 12 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Low-resolution-only training: Overview over our proposed approach to train single-image super-resolution (SISR) models only with low-resolution (LR) data, consisting of our proposed multi-scale training method for bicubic degradation (MSTbic). It creates pseudo-LR/HR training pairs enabling supervised training of a SwinIR lightweight (lw) transformer model. The LR-only SISR benchmark provides only low-resolution (LR) images during training, while high-resolution (HR) test data is used for evaluation.
  • Figure 2: Architecture components of SwinIRliang_swinir_2021 for single-image super-resolution (SISR): (a) full architecture overview, (b) high-resolution image reconstruction module, (c) residual Swin transformer block (RSTB), (d) Swin transformer layer (STL).
  • Figure 3: LR-only training methods: (a) Our proposed multi-scale training with bicubic upscaling and bicubic degradation (MSTbic), (b) SimUSRahn_simusr_2020 (baseline method). MSTbic leverages initial downscaling (nearest neighbor) and upscaling (bicubic), while SimUSR only uses downscaling (bicubic) for image scale augmentation.
  • Figure 4: Visual comparison of image patches, which are marked with a white box in the original test images of Urban100 and Set14 on the left. Shown are the low resolution (LR), high-resolution ground truth (GT), and 4x SR results for combinations of SimUSR ahn_simusr_2020 or MSTbic (ours) training methods with CARN or SwinIR lw models, as well as the result for the bicubic baseline. For each patch, PSNR (dB) and SSIM are calculated and the error map is shown.