Table of Contents
Fetching ...

Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

Ru Ito, Supatta Viriyavisuthisakul, Kazuhiko Kawamoto, Hiroshi Kera

TL;DR

This paper tackles the gap between synthetic SR training data and real-world degradations by introducing a dataset-generation method that uses downsampled HR images passed through undertrained image reconstruction models to produce degraded LR images. The resulting dataset $D = \{(\mathbf{x}^{i}_{\mathrm{deg}}, \mathbf{y}^{i})\}^{|\mathcal{D}|}_{i=1}$ enables fine-tuning of pre-trained SR models (e.g., HAT, EDSR, ESRGAN, SwinIR), yielding improvements on real-world LR benchmarks, with degradation diversity driving gains and color differences sometimes hindering performance. Among reconstruction models, VQ-VAE-2 trained for 8 epochs produced the strongest SR improvements across multiple SR backbones, indicating a model-agnostic and cost-effective path to better real-world SR generalization. The work provides practical guidance on leveraging degradation diversity while highlighting the need to control color shifts in degraded data for optimal SR training outcomes.

Abstract

Most super-resolution (SR) models struggle with real-world low-resolution (LR) images. This issue arises because the degradation characteristics in the synthetic datasets differ from those in real-world LR images. Since SR models are trained on pairs of high-resolution (HR) and LR images generated by downsampling, they are optimized for simple degradation. However, real-world LR images contain complex degradation caused by factors such as the imaging process and JPEG compression. Due to these differences in degradation characteristics, most SR models perform poorly on real-world LR images. This study proposes a dataset generation method using undertrained image reconstruction models. These models have the property of reconstructing low-quality images with diverse degradation from input images. By leveraging this property, this study generates LR images with diverse degradation from HR images to construct the datasets. Fine-tuning pre-trained SR models on our generated datasets improves noise removal and blur reduction, enhancing performance on real-world LR images. Furthermore, an analysis of the datasets reveals that degradation diversity contributes to performance improvements, whereas color differences between HR and LR images may degrade performance. 11 pages, (11 figures and 2 tables)

Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

TL;DR

This paper tackles the gap between synthetic SR training data and real-world degradations by introducing a dataset-generation method that uses downsampled HR images passed through undertrained image reconstruction models to produce degraded LR images. The resulting dataset enables fine-tuning of pre-trained SR models (e.g., HAT, EDSR, ESRGAN, SwinIR), yielding improvements on real-world LR benchmarks, with degradation diversity driving gains and color differences sometimes hindering performance. Among reconstruction models, VQ-VAE-2 trained for 8 epochs produced the strongest SR improvements across multiple SR backbones, indicating a model-agnostic and cost-effective path to better real-world SR generalization. The work provides practical guidance on leveraging degradation diversity while highlighting the need to control color shifts in degraded data for optimal SR training outcomes.

Abstract

Most super-resolution (SR) models struggle with real-world low-resolution (LR) images. This issue arises because the degradation characteristics in the synthetic datasets differ from those in real-world LR images. Since SR models are trained on pairs of high-resolution (HR) and LR images generated by downsampling, they are optimized for simple degradation. However, real-world LR images contain complex degradation caused by factors such as the imaging process and JPEG compression. Due to these differences in degradation characteristics, most SR models perform poorly on real-world LR images. This study proposes a dataset generation method using undertrained image reconstruction models. These models have the property of reconstructing low-quality images with diverse degradation from input images. By leveraging this property, this study generates LR images with diverse degradation from HR images to construct the datasets. Fine-tuning pre-trained SR models on our generated datasets improves noise removal and blur reduction, enhancing performance on real-world LR images. Furthermore, an analysis of the datasets reveals that degradation diversity contributes to performance improvements, whereas color differences between HR and LR images may degrade performance. 11 pages, (11 figures and 2 tables)

Paper Structure

This paper contains 15 sections, 2 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Fine-tuning a pre-trained HAT model with a dataset generated by VQ-VAE-2 trained for 8 epochs produced notable improvements. The pre-trained HAT model retained noise and blur from the LR images, but the fine-tuned model effectively reduced noise and restored sharpness.
  • Figure 2: The overview of the proposed method. This method first downsamples an HR image to create an LR image. Then, the LR image is fed as an input to the undertrained image reconstruction model to generate a degraded LR image. Finally, it constructs a dataset by pairing the degraded LR image with the HR image.
  • Figure 3: Reconstruction results of five image reconstruction models. The rightmost column represents the input images, and the other columns show reconstructed images at different training epochs. The results indicate that Vanilla VAE and DDIM fail to reconstruct the input images. MAE produces images that deviate significantly from the input at epochs 4, 8, and 16, but at epochs 32 and 64, it reconstructs structural information and color to some extent. VQ-VAE and VQ-VAE-2 achieve relatively accurate reconstructions even with minimal training while incorporating degradation.
  • Figure 4: Examples of LR images generated by each model. The rightmost column represents the input images, while the other columns show the reconstructed images. All models preserve the structural information of the input while introducing degradation. VQ-VAE-2 reconstructs images close to the input regardless of the number of training epochs. VQ-VAE tends to produce images with a yellowish tint, and MAE introduces sparse degradations. These results confirm that each model generates degraded images with distinct characteristics.
  • Figure 5: Examples from the dataset generated by VQ-VAE-2 trained for epoch 8. The left two columns show input images, and the right represents reconstructed images. In (a), the brick colors vary across red and green, introducing noise. In (b), the cherry blossom contours appear blurred, indicating the presence of blur. In (c), ring-shaped artifacts are visible around the head of the duck, resulting in ringing. In (d), jagged edges appear along the straight lines of the roof, showing aliasing. These results demonstrate that the proposed method can generate datasets containing diverse types of degradation.
  • ...and 6 more figures