Table of Contents
Fetching ...

NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

TL;DR

The paper introduces NAFRSSR, a lightweight recursive network for stereo image super-resolution that builds on NAFSSR by incorporating recursive connections, weight-sharing depthwise cross-attention (DSSCAM), and group-convolution blocks (NAFGCBlock). It replaces heavy attention paths with efficient components, and adds a trainable edge detector to boost high-frequency details, achieving superior PSNR/SSIM with far fewer parameters and faster inference (notably NAFRSSR-M at 0.28M parameters and ~50 ms). Four variants (M,T,S,B) demonstrate strong performance across KITTI, Middlebury, and Flickr1024 datasets, outperforming prior state-of-the-art methods while maintaining reduced model size. The work emphasizes practical deployment potential for mobile and embedded systems due to its efficiency and competitive accuracy.

Abstract

Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at https://github.com/JNUChenYiHong/NAFRSSR.

NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

TL;DR

The paper introduces NAFRSSR, a lightweight recursive network for stereo image super-resolution that builds on NAFSSR by incorporating recursive connections, weight-sharing depthwise cross-attention (DSSCAM), and group-convolution blocks (NAFGCBlock). It replaces heavy attention paths with efficient components, and adds a trainable edge detector to boost high-frequency details, achieving superior PSNR/SSIM with far fewer parameters and faster inference (notably NAFRSSR-M at 0.28M parameters and ~50 ms). Four variants (M,T,S,B) demonstrate strong performance across KITTI, Middlebury, and Flickr1024 datasets, outperforming prior state-of-the-art methods while maintaining reduced model size. The work emphasizes practical deployment potential for mobile and embedded systems due to its efficiency and competitive accuracy.

Abstract

Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at https://github.com/JNUChenYiHong/NAFRSSR.
Paper Structure (28 sections, 16 equations, 9 figures, 6 tables)

This paper contains 28 sections, 16 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The overall architecture of NAFRSSR. NAFGCBlock represents nonlinear activation free and group convolution-based block. DSSCAM represents weight-sharing depth-separated stereo cross attention module.
  • Figure 2: Comparison between (a) stereo cross attention module (SCAM) of NAFSSR and (b) depth-separated stereo cross attention module (DSSCAM) of NAFRSSR.
  • Figure 3: (a) Nonlinear activation free block (NAFBlock) of NAFSSR. Two types of nonlinear activation free and group convolution-based blocks (NAFGCBlocks): (b) NAFGCBlock-1 and (c) NAFGCBlock-2.
  • Figure 4: Schematic diagrams of (a) SimpleGate (SG) and (b) Simplified channel attention (SCA).
  • Figure 5: (a) The input HR image, (b) $SR_{NAFSSR}$ is the feature map output by the NAFSSR model and (c) $edge_{HR}$ is the result of the edge detection of the HR image.
  • ...and 4 more figures