Table of Contents
Fetching ...

Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution

Xinning Chai, Yao Zhang, Yuxuan Zhang, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song

TL;DR

This work tackles efficient single-image super-resolution by applying ConvLoRA to a lightweight backbone (SPAN) and enhancing fine-tuning with knowledge distillation that preserves second-order feature statistics. It introduces the SConvLB module to integrate low-rank adapters into the SPAB block and extends ConvLoRA to the pixel shuffle stage, enabling performance gains without increasing inference costs. A hybrid distillation scheme, combining spatial affinity loss and pixel-level and reconstruction losses, guides the student model to capture critical textures and structures ($L_{total} = \lambda_1 L_{rec} + \lambda_2 L_{TS} + \lambda_3 L_{AD}$). Empirically, DSCLoRA achieves consistent PSNR/SSIM gains over SPAN across multiple benchmarks, with the DSCLoRA-L variant delivering top performance while remaining lightweight, and it ranked first in the NTIRE 2025 Efficient SR Challenge, showcasing practical impact for real-time SR tasks.

Abstract

Convolutional neural networks (CNNs) have been widely used in efficient image super-resolution. However, for CNN-based methods, performance gains often require deeper networks and larger feature maps, which increase complexity and inference costs. Inspired by LoRA's success in fine-tuning large language models, we explore its application to lightweight models and propose Distillation-Supervised Convolutional Low-Rank Adaptation (DSCLoRA), which improves model performance without increasing architectural complexity or inference costs. Specifically, we integrate ConvLoRA into the efficient SR network SPAN by replacing the SPAB module with the proposed SConvLB module and incorporating ConvLoRA layers into both the pixel shuffle block and its preceding convolutional layer. DSCLoRA leverages low-rank decomposition for parameter updates and employs a spatial feature affinity-based knowledge distillation strategy to transfer second-order statistical information from teacher models (pre-trained SPAN) to student models (ours). This method preserves the core knowledge of lightweight models and facilitates optimal solution discovery under certain conditions. Experiments on benchmark datasets show that DSCLoRA improves PSNR and SSIM over SPAN while maintaining its efficiency and competitive image quality. Notably, DSCLoRA ranked first in the Overall Performance Track of the NTIRE 2025 Efficient Super-Resolution Challenge. Our code and models are made publicly available at https://github.com/Yaozzz666/DSCF-SR.

Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution

TL;DR

This work tackles efficient single-image super-resolution by applying ConvLoRA to a lightweight backbone (SPAN) and enhancing fine-tuning with knowledge distillation that preserves second-order feature statistics. It introduces the SConvLB module to integrate low-rank adapters into the SPAB block and extends ConvLoRA to the pixel shuffle stage, enabling performance gains without increasing inference costs. A hybrid distillation scheme, combining spatial affinity loss and pixel-level and reconstruction losses, guides the student model to capture critical textures and structures (). Empirically, DSCLoRA achieves consistent PSNR/SSIM gains over SPAN across multiple benchmarks, with the DSCLoRA-L variant delivering top performance while remaining lightweight, and it ranked first in the NTIRE 2025 Efficient SR Challenge, showcasing practical impact for real-time SR tasks.

Abstract

Convolutional neural networks (CNNs) have been widely used in efficient image super-resolution. However, for CNN-based methods, performance gains often require deeper networks and larger feature maps, which increase complexity and inference costs. Inspired by LoRA's success in fine-tuning large language models, we explore its application to lightweight models and propose Distillation-Supervised Convolutional Low-Rank Adaptation (DSCLoRA), which improves model performance without increasing architectural complexity or inference costs. Specifically, we integrate ConvLoRA into the efficient SR network SPAN by replacing the SPAB module with the proposed SConvLB module and incorporating ConvLoRA layers into both the pixel shuffle block and its preceding convolutional layer. DSCLoRA leverages low-rank decomposition for parameter updates and employs a spatial feature affinity-based knowledge distillation strategy to transfer second-order statistical information from teacher models (pre-trained SPAN) to student models (ours). This method preserves the core knowledge of lightweight models and facilitates optimal solution discovery under certain conditions. Experiments on benchmark datasets show that DSCLoRA improves PSNR and SSIM over SPAN while maintaining its efficiency and competitive image quality. Notably, DSCLoRA ranked first in the Overall Performance Track of the NTIRE 2025 Efficient Super-Resolution Challenge. Our code and models are made publicly available at https://github.com/Yaozzz666/DSCF-SR.

Paper Structure

This paper contains 18 sections, 9 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Comparison of Parameters, FLOPs, and PSNR for models on the Manga109 dataset in the $\times4$ scale SR task. DSCLoRA is our 26-channel model while DSCLoRA-L is 48-channel. The color red indicates that the flops are the smallest. The size of the circle represents the number of parameters, the closer to the upper left the better the model is.
  • Figure 2: The whole architecture of DSCLoRA model. We replace the SPAB module with the proposed SConvLB module and incorporate ConvLoRA layers into both the pixel shuffle block and its preceding convolutional layer. Spatial Affinity Distillation Loss is calculated between each feature map.
  • Figure 3: Schematic diagram of SPAB structure wan2024swift.
  • Figure 4: Schematic diagram of SConvLB structure. The $\sigma$ in the figure represents the SiLU activation layer. The pretrained model weights (denoted as $W_{PT}$) remain fixed during training, while only the LoRA parameters ($X$ and $Y$) are updated.
  • Figure 5: Visual comparison of images with rich texture details generated by different models on the DIV2K_LSDIR_valid dataset ren2025tenth.
  • ...and 1 more figures