Table of Contents
Fetching ...

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

TL;DR

This work presents Low-Res Leads the Way (LWay), a practical SR framework that blends supervised pre-training on synthetic degradations with self-supervised fine-tuning on unseen real-world test images. It introduces an LR reconstruction network to extract a degradation embedding and uses a DWT-derived high-frequency weighting to guide HF detail restoration, enabling adaptation without architecture changes. Across RealSR, DRealSR, and multiple SR backbones (CNNs, Transformers, VQGAN, diffusion), LWay yields consistent gains in perceptual and fidelity metrics, demonstrating improved generalization to real-world degradations with fast test-time optimization. The approach reduces reliance on paired real data, maintains compatibility across models, and offers a scalable, deployment-friendly path toward robust real-world SR.

Abstract

For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications.

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

TL;DR

This work presents Low-Res Leads the Way (LWay), a practical SR framework that blends supervised pre-training on synthetic degradations with self-supervised fine-tuning on unseen real-world test images. It introduces an LR reconstruction network to extract a degradation embedding and uses a DWT-derived high-frequency weighting to guide HF detail restoration, enabling adaptation without architecture changes. Across RealSR, DRealSR, and multiple SR backbones (CNNs, Transformers, VQGAN, diffusion), LWay yields consistent gains in perceptual and fidelity metrics, demonstrating improved generalization to real-world degradations with fast test-time optimization. The approach reduces reliance on paired real data, maintains compatibility across models, and offers a scalable, deployment-friendly path toward robust real-world SR.

Abstract

For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications.
Paper Structure (30 sections, 4 equations, 16 figures, 10 tables)

This paper contains 30 sections, 4 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: Our proposed training method combine the benefits of supervised learning (SL) on synthetic data and self-supervised learning (SSL) on the unseen test images, achieve high quality and high fidelity SR results.
  • Figure 2: Comparison of different learning approaches for real-world image SR.
  • Figure 3: The proposed training pipeline (LWay) consists of two steps. In Step 1, we pre-train a LR reconstruction network to capture degradation embedding from LR images. This embedding is then applied to HR images, regenerating LR content. Moving to Step 2, for test images, a pre-trained SR model generates SR outputs, which are then degraded by the fixed LR reconstruction network. We iteratively update the SR model using a self-supervised learning loss applied to LR images, with a focus on high-frequency details through weighted loss. This refinement process enhances the SR model's generalization performance on previously unseen images.
  • Figure 4: The SR model advances through the proposed fine-tuning iterations, moving from the supervised learning (SL) space of synthetic degradation to the self-supervised learning (SSL) space learned from test images. This results in enhanced SR quality and fidelity.
  • Figure 5: Qualitative comparisons on real-world datasets. The content within the blue box represents a zoomed-in image.
  • ...and 11 more figures