Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

Yunxiang Li; Wenbin Zou; Qiaomu Wei; Feng Huang; Jing Wu

Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

Yunxiang Li, Wenbin Zou, Qiaomu Wei, Feng Huang, Jing Wu

TL;DR

The paper tackles the efficiency bottleneck in stereo image super-resolution by introducing MFFSSR, a lightweight dual-branch network that fuses multi-level intra-view features with cross-view information. It introduces the Hybrid Attention Feature Extraction Block (HAFEB) for intra-view feature extraction and embeds a Cross-View Interaction Module (CVIM) within a channel-separated architecture to enable efficient cross-view fusion. The approach achieves superior PSNR/SSIM with far fewer parameters and FLOPs compared to state-of-the-art methods, and shows competitive performance in NTIRE 2024 challenges, highlighting its practicality for edge deployment. The combination of CA, LKA, RepConv, and a carefully balanced channel-split strategy enables effective detail and texture reconstruction while maintaining computational efficiency.

Abstract

Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction modules and cross-view feature interaction modules to make use of the information from stereo images is the focus of numerous methods. However, this adds a great deal of network parameters and structural redundancy. To facilitate the application of stereo image super-resolution in downstream tasks, we propose an efficient Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution (MFFSSR). Specifically, MFFSSR utilizes the Hybrid Attention Feature Extraction Block (HAFEB) to extract multi-level intra-view features. Using the channel separation strategy, HAFEB can efficiently interact with the embedded cross-view interaction module. This structural configuration can efficiently mine features inside the view while improving the efficiency of cross-view information sharing. Hence, reconstruct image details and textures more accurately. Abundant experiments demonstrate the effectiveness of MFFSSR. We achieve superior performance with fewer parameters. The source code is available at https://github.com/KarosLYX/MFFSSR.

Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

TL;DR

Abstract

Paper Structure (15 sections, 24 equations, 6 figures, 4 tables)

This paper contains 15 sections, 24 equations, 6 figures, 4 tables.

Introduction
Related Works
Single Image Super-Resolution
Stereo Image Super-Resolution
Multi-Level Feature Fusion Network
Overall Framework
Intra-View Feature Extraction
Cross-View Feature Interaction
Loss Function
Experiments
Implementation Details
Comparisons with State-of-the-art Methods
Ablation Study
NTIRE Stereo Image SR Challenge
Conclusion

Figures (6)

Figure 1: Comparison of the performance and complexity of state-of-the-art methods for $4 \times$ stereo SR on the Flickr1024 37 test set. Our MFFSSR achieves superior performance with fewer parameters.
Figure 2: The framework of Multi-Level Feature Fusion Network for Lightweight Stereo Image SR (MFFSSR). HAFEB (shown in Figure \ref{['MFFBlock']}.) and CVIM (shown in Figure \ref{['CVIM']}.) represent the Hybrid Attention Feature Extraction Block and the Cross-View Interaction Module, respectively. Two HAFEBs with an embedded CVIM compose a MFF Block.
Figure 3: The architecture of our proposed Multi-level Feature Fusion Block (MFF Block). A MFF Block consists of two Hybrid Attention Feature Extraction Blocks (HAFEB) and an embedded Cross-View Interaction Module (CVIM). Each HAFEB has two components: Multi-level Feature Extraction and Fusion (MFEF) and Information Refinement Feedforward (IRF). The HAFEBs for the left and right views are connected to CVIM through the branch structures in MFEF, facilitating the interaction and fusion of features across the views. PConv, RepConv, DWConv and DWDConv in the figure represent point-wise convolution, reparameterized convolution, depth-wise convolution, and depth-wise dilation convolution, respectively.
Figure 4: The architecture of Cross-View Interaction Module (CVIM). It is embedded in two Hybrid Attention Feature Extraction Blocks of the parallel branches to achieve efficient cross-view feature interaction. PConv, DWConv in the figure represent point-wise convolution, depth-wise convolution, respectively.
Figure 5: Visual results ($\times 4$ SR) achieved by different methods on the Flickr1024 37 test set.
...and 1 more figures

Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

TL;DR

Abstract

Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (6)