A Lightweight Image Super-Resolution Transformer Trained on Low-Resolution Images Only
Björn Möller, Lucas Görnhardt, Tim Fingscheidt
TL;DR
This work addresses single-image super-resolution under the LR-only training constraint, where HR training data are unavailable. It introduces MSTbic, a multi-scale scale-augmentation method that generates pseudo-LR/HR pairs from LR images to supervise training of a lightweight SwinIR transformer (0.89M parameters). By adapting a microscopy-based LR-only training approach to macroscopic data, MSTbic yields state-of-the-art results on standard LR-only SR benchmarks across multiple datasets, for both transformer and CNN backbones. The findings demonstrate that transformer-based SR can be effectively trained with LR data alone, expanding practical applicability in real-world scenarios with scarce HR training resources.
Abstract
Transformer architectures prominently lead single-image super-resolution (SISR) benchmarks, reconstructing high-resolution (HR) images from their low-resolution (LR) counterparts. Their strong representative power, however, comes with a higher demand for training data compared to convolutional neural networks (CNNs). For many real-world SR applications, the availability of high-quality HR training images is not given, sparking interest in LR-only training methods. The LR-only SISR benchmark mimics this condition by allowing only low-resolution (LR) images for model training. For a 4x super-resolution, this effectively reduces the amount of available training data to 6.25% of the HR image pixels, which puts the employment of a data-hungry transformer model into question. In this work, we are the first to utilize a lightweight vision transformer model with LR-only training methods addressing the unsupervised SISR LR-only benchmark. We adopt and configure a recent LR-only training method from microscopy image super-resolution to macroscopic real-world data, resulting in our multi-scale training method for bicubic degradation (MSTbic). Furthermore, we compare it with reference methods and prove its effectiveness both for a transformer and a CNN model. We evaluate on the classic SR benchmark datasets Set5, Set14, BSD100, Urban100, and Manga109, and show superior performance over state-of-the-art (so far: CNN-based) LR-only SISR methods. The code is available on GitHub: https://github.com/ifnspaml/SuperResolutionMultiscaleTraining.
