Table of Contents
Fetching ...

UCAN: Unified Convolutional Attention Network for Expansive Receptive Fields in Lightweight Super-Resolution

Cao Thien Tan, Phan Thi Thu Trang, Do Nghiem Duc, Ho Ngoc Anh, Hanyang Zhuang, Nguyen Duc Dung

Abstract

Hybrid CNN-Transformer architectures achieve strong results in image super-resolution, but scaling attention windows or convolution kernels significantly increases computational cost, limiting deployment on resource-constrained devices. We present UCAN, a lightweight network that unifies convolution and attention to expand the effective receptive field efficiently. UCAN combines window-based spatial attention with a Hedgehog Attention mechanism to model both local texture and long-range dependencies, and introduces a distillation-based large-kernel module to preserve high-frequency structure without heavy computation. In addition, we employ cross-layer parameter sharing to further reduce complexity. On Manga109 ($4\times$), UCAN-L achieves 31.63 dB PSNR with only 48.4G MACs, surpassing recent lightweight models. On BSDS100, UCAN attains 27.79 dB, outperforming methods with significantly larger models. Extensive experiments show that UCAN achieves a superior trade-off between accuracy, efficiency, and scalability, making it well-suited for practical high-resolution image restoration.

UCAN: Unified Convolutional Attention Network for Expansive Receptive Fields in Lightweight Super-Resolution

Abstract

Hybrid CNN-Transformer architectures achieve strong results in image super-resolution, but scaling attention windows or convolution kernels significantly increases computational cost, limiting deployment on resource-constrained devices. We present UCAN, a lightweight network that unifies convolution and attention to expand the effective receptive field efficiently. UCAN combines window-based spatial attention with a Hedgehog Attention mechanism to model both local texture and long-range dependencies, and introduces a distillation-based large-kernel module to preserve high-frequency structure without heavy computation. In addition, we employ cross-layer parameter sharing to further reduce complexity. On Manga109 (), UCAN-L achieves 31.63 dB PSNR with only 48.4G MACs, surpassing recent lightweight models. On BSDS100, UCAN attains 27.79 dB, outperforming methods with significantly larger models. Extensive experiments show that UCAN achieves a superior trade-off between accuracy, efficiency, and scalability, making it well-suited for practical high-resolution image restoration.
Paper Structure (19 sections, 11 equations, 6 figures, 3 tables)

This paper contains 19 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Performance comparison of PSNR versus model parameters on the Manga109 ($\times$4) dataset. Our method is evaluated alongside state-of-the-art super-resolution approaches. Green triangles represent CNN-based methods, yellow squares denote Transformer-based methods, blue circles indicate SSM-based methods, and red diamonds show our proposed approach.
  • Figure 2: Comparison of feature maps output by Linear Attention (Using ReLU, ELU feature maps) and Hedgehog Attention. All experiments are conducted based on an image with $N$ = 256 and $d$ = 48. The full rank of matrices in the figure is 64. This figure shows that our model outperforms previous methods in diversity.
  • Figure 3: Detailed architecture of (a) Shared and Received Hybrid Attention (SHA and RHA) and (b) Large Kernel Distillation (LKD). LKD contains a Triple Feature Extraction block with three branches, which are the Large Kernel Spatial Branch, the Channel Branch, and the Small Kernel Spatial Branch. SHA and RHA employ Shared and Received Window Multi Head Self Attention (Shared WMHSA and Received WMHSA) to capture local information, and include a Shared Dual Fusion Layer (SDFL) and a Dual Fusion Receiver Layer (DFRL) to aggregate global context. The Dual Fusion Layer comprises two sub branches, which are Hedgehog Attention (HgA) and Channel Attention (CA).
  • Figure 4: Visual comparison between ground truth and different methods on Urban100.
  • Figure 5: Visualize in detail ERF of MambaIR guo2025mambair, MambaIRv2guo2024mambairv2 and UCAN
  • ...and 1 more figures