Table of Contents
Fetching ...

DiMoSR: Feature Modulation via Multi-Branch Dilated Convolutions for Efficient Image Super-Resolution

M. Akin Yilmaz, Ahmet Bilican, A. Murat Tekalp

TL;DR

DiMoSR addresses the efficiency-quality trade-off in lightweight single-image super-resolution by introducing Dilated Modulation Blocks that perform multi-branch feature modulation via dilated convolutions. The architecture combines a Feature Enhancement Block with an Efficient Residual Block to modulate and refine features while preserving information flow, achieving state-of-the-art results on 4× SR with reduced computation compared to peers. Extensive ablations show that both attention and feature modulation contribute meaningfully, with FFT-based frequency loss further boosting high-frequency detail reconstruction. Overall, DiMoSR offers a practical, scalable approach for high-quality SR on resource-constrained devices, supported by comprehensive experiments and open-source code.

Abstract

Balancing reconstruction quality versus model efficiency remains a critical challenge in lightweight single image super-resolution (SISR). Despite the prevalence of attention mechanisms in recent state-of-the-art SISR approaches that primarily emphasize or suppress feature maps, alternative architectural paradigms warrant further exploration. This paper introduces DiMoSR (Dilated Modulation Super-Resolution), a novel architecture that enhances feature representation through modulation to complement attention in lightweight SISR networks. The proposed approach leverages multi-branch dilated convolutions to capture rich contextual information over a wider receptive field while maintaining computational efficiency. Experimental results demonstrate that DiMoSR outperforms state-of-the-art lightweight methods across diverse benchmark datasets, achieving superior PSNR and SSIM metrics with comparable or reduced computational complexity. Through comprehensive ablation studies, this work not only validates the effectiveness of DiMoSR but also provides critical insights into the interplay between attention mechanisms and feature modulation to guide future research in efficient network design. The code and model weights to reproduce our results are available at: https://github.com/makinyilmaz/DiMoSR

DiMoSR: Feature Modulation via Multi-Branch Dilated Convolutions for Efficient Image Super-Resolution

TL;DR

DiMoSR addresses the efficiency-quality trade-off in lightweight single-image super-resolution by introducing Dilated Modulation Blocks that perform multi-branch feature modulation via dilated convolutions. The architecture combines a Feature Enhancement Block with an Efficient Residual Block to modulate and refine features while preserving information flow, achieving state-of-the-art results on 4× SR with reduced computation compared to peers. Extensive ablations show that both attention and feature modulation contribute meaningfully, with FFT-based frequency loss further boosting high-frequency detail reconstruction. Overall, DiMoSR offers a practical, scalable approach for high-quality SR on resource-constrained devices, supported by comprehensive experiments and open-source code.

Abstract

Balancing reconstruction quality versus model efficiency remains a critical challenge in lightweight single image super-resolution (SISR). Despite the prevalence of attention mechanisms in recent state-of-the-art SISR approaches that primarily emphasize or suppress feature maps, alternative architectural paradigms warrant further exploration. This paper introduces DiMoSR (Dilated Modulation Super-Resolution), a novel architecture that enhances feature representation through modulation to complement attention in lightweight SISR networks. The proposed approach leverages multi-branch dilated convolutions to capture rich contextual information over a wider receptive field while maintaining computational efficiency. Experimental results demonstrate that DiMoSR outperforms state-of-the-art lightweight methods across diverse benchmark datasets, achieving superior PSNR and SSIM metrics with comparable or reduced computational complexity. Through comprehensive ablation studies, this work not only validates the effectiveness of DiMoSR but also provides critical insights into the interplay between attention mechanisms and feature modulation to guide future research in efficient network design. The code and model weights to reproduce our results are available at: https://github.com/makinyilmaz/DiMoSR

Paper Structure

This paper contains 21 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Our proposed DiMoSR framework: a) Architecture consisting of Dilated Modulation Blocks (DMBs), (b) Inside of a single DMB. It consists of a feature enhancement block (FEB) followed by an efficient residual block (ERB).
  • Figure 2: Visual comparison on two Urban100 images. Left: full-size images with marked zoom-in regions. Right: HR ground truth and SR results.
  • Figure 3: Visual comparison on img062 from Urban100 ($\times 4$ upscaling). Note that SAFMNsafmn and ShuffleMixershufflemixer were trained with FFT loss, yet our model without FFT loss achieves comparable results. Our complete model with FFT Loss demonstrates superior edge preservation and detail reconstruction, particularly in structural patterns.