Table of Contents
Fetching ...

AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution

Cansu Korkmaz, Nancy Mehta, Radu Timofte

TL;DR

AdaptSR tackles real-world image super-resolution by converting bicubic-trained backbones into real-world SR solvers through low-rank adaptation (LoRA). By freezing the pretrained weights and training only lightweight LoRA modules, it updates a small fraction of parameters and merges them into the base model with no inference overhead, enabling minutes-long adaptation on lightweight hardware. The method achieves up to $4$ dB PSNR gains and improved perceptual scores on RealSR benchmarks, often outperforming GAN- and diffusion-based real-SR models while using orders of magnitude fewer trainable parameters. It further demonstrates efficient adaptive merging, layer-wise ablations, and extensions to GAN-based SR (AdaptSR-GAN), making real-world SR practical and scalable across CNN and Transformer architectures.

Abstract

Recovering high-frequency details and textures from low-resolution images remains a fundamental challenge in super-resolution (SR), especially when real-world degradations are complex and unknown. While GAN-based methods enhance realism, they suffer from training instability and introduce unnatural artifacts. Diffusion models, though promising, demand excessive computational resources, often requiring multiple GPU days, even for single-step variants. Rather than naively fine-tuning entire models or adopting unstable generative approaches, we introduce AdaptSR, a low-rank adaptation (LoRA) framework that efficiently repurposes bicubic-trained SR models for real-world tasks. AdaptSR leverages architecture-specific insights and selective layer updates to optimize real SR adaptation. By updating only lightweight LoRA layers while keeping the pretrained backbone intact, it captures domain-specific adjustments without adding inference cost, as the adapted layers merge seamlessly post-training. This efficient adaptation not only reduces memory and compute requirements but also makes real-world SR feasible on lightweight hardware. Our experiments demonstrate that AdaptSR outperforms GAN and diffusion-based SR methods by up to 4 dB in PSNR and 2% in perceptual scores on real SR benchmarks. More impressively, it matches or exceeds full model fine-tuning while training 92% fewer parameters, enabling rapid adaptation to real SR tasks within minutes.

AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution

TL;DR

AdaptSR tackles real-world image super-resolution by converting bicubic-trained backbones into real-world SR solvers through low-rank adaptation (LoRA). By freezing the pretrained weights and training only lightweight LoRA modules, it updates a small fraction of parameters and merges them into the base model with no inference overhead, enabling minutes-long adaptation on lightweight hardware. The method achieves up to dB PSNR gains and improved perceptual scores on RealSR benchmarks, often outperforming GAN- and diffusion-based real-SR models while using orders of magnitude fewer trainable parameters. It further demonstrates efficient adaptive merging, layer-wise ablations, and extensions to GAN-based SR (AdaptSR-GAN), making real-world SR practical and scalable across CNN and Transformer architectures.

Abstract

Recovering high-frequency details and textures from low-resolution images remains a fundamental challenge in super-resolution (SR), especially when real-world degradations are complex and unknown. While GAN-based methods enhance realism, they suffer from training instability and introduce unnatural artifacts. Diffusion models, though promising, demand excessive computational resources, often requiring multiple GPU days, even for single-step variants. Rather than naively fine-tuning entire models or adopting unstable generative approaches, we introduce AdaptSR, a low-rank adaptation (LoRA) framework that efficiently repurposes bicubic-trained SR models for real-world tasks. AdaptSR leverages architecture-specific insights and selective layer updates to optimize real SR adaptation. By updating only lightweight LoRA layers while keeping the pretrained backbone intact, it captures domain-specific adjustments without adding inference cost, as the adapted layers merge seamlessly post-training. This efficient adaptation not only reduces memory and compute requirements but also makes real-world SR feasible on lightweight hardware. Our experiments demonstrate that AdaptSR outperforms GAN and diffusion-based SR methods by up to 4 dB in PSNR and 2% in perceptual scores on real SR benchmarks. More impressively, it matches or exceeds full model fine-tuning while training 92% fewer parameters, enabling rapid adaptation to real SR tasks within minutes.

Paper Structure

This paper contains 15 sections, 6 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Comparison of model complexity and performance between our proposed low-rank adaptation models (orange) and other real-world SR approaches, including GAN-based (green) and diffusion-based (purple) models, for $\times$4 SR on the RealSR realsr_cai2019toward dataset. Baseline models with full fine-tuning are depicted in blue, and our LoRA models for the same baselines achieve better PSNR and LPIPS lpips scores with fewer trainable parameters.
  • Figure 2: An illustration of the proposed AdaptSR, LoRA layers are integrated into the frozen transformer architecture and thus can be seamlessly inserted into various CNN/transformer-based pre-trained SR models to adapt them into real SR. LoRA-modified layers, such as convolutional and attention layers, reduce parameters and computational load, enabling efficient, high-resolution outputs.
  • Figure 3: Transformer LoRA Layers (TLL) include LoRA-modified multi-head self-attention (MSA) and multi-layer perceptron (MLP) layers, using Linear-LoRA for efficient domain adaptation with reduced parameters.
  • Figure 4: Visual comparison of the proposed AdaptSR with the state-of-the-art methods for $\times$4 real SR. GAN and diffusion models fail to capture the correct content of images, exhibit excessive sharpness with color shifts. On the other hand, our LoRA-based models reconstruct high-fidelity details with correct alignment, particularly in complex areas with regular patterns. Further visual comparisons are provided in the supplementary materials.
  • Figure 5: Comparison of local attribution maps (LAMs) lam_gu2021interpreting between different adaptation modules and standard FT model on RealSR-Nikon41 realsr_cai2019toward image. The LAM results denote the importance of each pixel in the input image when super-resolving the patch marked with a red box. The diffusion index (DI) lam_gu2021interpreting value reflects the range of involved pixels hence better reconstruction.
  • ...and 2 more figures