Table of Contents
Fetching ...

LGFN: Lightweight Light Field Image Super-Resolution using Local Convolution Modulation and Global Attention Feature Extraction

Zhongxin Yu, Liang Chen, Zhiyun Zeng, Kunping Yang, Shaofei Luo, Shaorui Chen, Cheng Zhong

TL;DR

A lightweight model named LGFN is proposed, which integrates the local and global features of different views and the features of different channels for LF image SR, and design a lightweight CNN-based feature extraction module (namely, DGCE) to extract local features better through feature modulation.

Abstract

Capturing different intensity and directions of light rays at the same scene Light field (LF) can encode the 3D scene cues into a 4D LF image which has a wide range of applications (i.e. post-capture refocusing and depth sensing). LF image super-resolution (SR) aims to improve the image resolution limited by the performance of LF camera sensor. Although existing methods have achieved promising results the practical application of these models is limited because they are not lightweight enough. In this paper we propose a lightweight model named LGFN which integrates the local and global features of different views and the features of different channels for LF image SR. Specifically owing to neighboring regions of the same pixel position in different sub-aperture images exhibit similar structural relationships we design a lightweight CNN-based feature extraction module (namely DGCE) to extract local features better through feature modulation. Meanwhile as the position beyond the boundaries in the LF image presents a large disparity we propose an efficient spatial attention module (namely ESAM) which uses decomposable large-kernel convolution to obtain an enlarged receptive field and an efficient channel attention module (namely ECAM). Compared with the existing LF image SR models with large parameter our model has a parameter of 0.45M and a FLOPs of 19.33G which has achieved a competitive effect. Extensive experiments with ablation studies demonstrate the effectiveness of our proposed method which ranked the second place in the Track 2 Fidelity & Efficiency of NTIRE2024 Light Field Super Resolution Challenge and the seventh place in the Track 1 Fidelity.

LGFN: Lightweight Light Field Image Super-Resolution using Local Convolution Modulation and Global Attention Feature Extraction

TL;DR

A lightweight model named LGFN is proposed, which integrates the local and global features of different views and the features of different channels for LF image SR, and design a lightweight CNN-based feature extraction module (namely, DGCE) to extract local features better through feature modulation.

Abstract

Capturing different intensity and directions of light rays at the same scene Light field (LF) can encode the 3D scene cues into a 4D LF image which has a wide range of applications (i.e. post-capture refocusing and depth sensing). LF image super-resolution (SR) aims to improve the image resolution limited by the performance of LF camera sensor. Although existing methods have achieved promising results the practical application of these models is limited because they are not lightweight enough. In this paper we propose a lightweight model named LGFN which integrates the local and global features of different views and the features of different channels for LF image SR. Specifically owing to neighboring regions of the same pixel position in different sub-aperture images exhibit similar structural relationships we design a lightweight CNN-based feature extraction module (namely DGCE) to extract local features better through feature modulation. Meanwhile as the position beyond the boundaries in the LF image presents a large disparity we propose an efficient spatial attention module (namely ESAM) which uses decomposable large-kernel convolution to obtain an enlarged receptive field and an efficient channel attention module (namely ECAM). Compared with the existing LF image SR models with large parameter our model has a parameter of 0.45M and a FLOPs of 19.33G which has achieved a competitive effect. Extensive experiments with ablation studies demonstrate the effectiveness of our proposed method which ranked the second place in the Track 2 Fidelity & Efficiency of NTIRE2024 Light Field Super Resolution Challenge and the seventh place in the Track 1 Fidelity.
Paper Structure (17 sections, 16 equations, 4 figures, 3 tables)

This paper contains 17 sections, 16 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparisons of the parameters and PSNR of different LF image SR methods.
  • Figure 2: The epipolar plane images(EPI) sample of 4D LF is acquired with fixed angular coordinate and a fixed spatial coordinate. Specifically, the horizontal EPI is obtained with constants u and h, and the vertical EPI is obtained with constant v and w. On the one hand, the EPIs capture spatial structures such as edges or textures, and the adjacent areas corresponding to the same pixel position across different SAIs exhibit similar structural relationships. On the other hand, the EPIs reflect the disparity information via line patterns of different slopes, whereas positions located outside the boundary in the LF image exhibit a large parallax.
  • Figure 3: An overview of our LGFN network. (a) Local and global deep feature extraction module (LGFM); (b) Double-gated convolution extraction module (DGCE); (c) Efficient spatial attention module (ESAM); (d) Efficient channel attention module (ECAM). Given SAIs as inputs, we adopt bilinear upsamping to initial content of the original images. For feature extraction, we first use a 3D convolution to extract shallow features, then use the deep feature extraction module to get them, and finally use the upsampling module to obtain ultimate super-resolved SAI results. The depth feature extraction module (DFEM) includes seven local and global feature extraction modules, which are composed of DGCE, ESAM and ECAM.
  • Figure 4: Qualitative results for 4x SR. The super-resolved center view images are presented for detailed texture comparison. The corresponding PSNR/SSIM scores of different methods on the presented scenes are also reported below.