Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution
Ru Ito, Supatta Viriyavisuthisakul, Kazuhiko Kawamoto, Hiroshi Kera
TL;DR
This paper tackles the gap between synthetic SR training data and real-world degradations by introducing a dataset-generation method that uses downsampled HR images passed through undertrained image reconstruction models to produce degraded LR images. The resulting dataset $D = \{(\mathbf{x}^{i}_{\mathrm{deg}}, \mathbf{y}^{i})\}^{|\mathcal{D}|}_{i=1}$ enables fine-tuning of pre-trained SR models (e.g., HAT, EDSR, ESRGAN, SwinIR), yielding improvements on real-world LR benchmarks, with degradation diversity driving gains and color differences sometimes hindering performance. Among reconstruction models, VQ-VAE-2 trained for 8 epochs produced the strongest SR improvements across multiple SR backbones, indicating a model-agnostic and cost-effective path to better real-world SR generalization. The work provides practical guidance on leveraging degradation diversity while highlighting the need to control color shifts in degraded data for optimal SR training outcomes.
Abstract
Most super-resolution (SR) models struggle with real-world low-resolution (LR) images. This issue arises because the degradation characteristics in the synthetic datasets differ from those in real-world LR images. Since SR models are trained on pairs of high-resolution (HR) and LR images generated by downsampling, they are optimized for simple degradation. However, real-world LR images contain complex degradation caused by factors such as the imaging process and JPEG compression. Due to these differences in degradation characteristics, most SR models perform poorly on real-world LR images. This study proposes a dataset generation method using undertrained image reconstruction models. These models have the property of reconstructing low-quality images with diverse degradation from input images. By leveraging this property, this study generates LR images with diverse degradation from HR images to construct the datasets. Fine-tuning pre-trained SR models on our generated datasets improves noise removal and blur reduction, enhancing performance on real-world LR images. Furthermore, an analysis of the datasets reveals that degradation diversity contributes to performance improvements, whereas color differences between HR and LR images may degrade performance. 11 pages, (11 figures and 2 tables)
