Table of Contents
Fetching ...

LFMamba: Light Field Image Super-Resolution with State Space Model

Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

TL;DR

This work introduces LFMamba, a pure State Space Model (SSM) based network for light field image super-resolution (LFSR) that operates on informative 2D LF slices (SAI, MacPI, EPI-H, EPI-V) to capture spatial, angular, and structural information with linear-time complexity. A core contribution is the efficient ESS2D-enabled S6 mechanism embedded in a two-stage basic SSM block, enabling strong long-range modeling with significantly reduced parameters. The architecture assembles Initial Feature Extraction, Spatial-Angular Feature Learning, and LF Structure Feature Learning to produce high-quality HR LF outputs, and demonstrates competitive results across five LF SR benchmarks, with notable angular consistency and successful extension to LF Angular SR. The findings suggest that LF representation learning via SSMs on 2D slices offers an effective and efficient alternative to CNN- or Transformer-based approaches, with potential for broader LF tasks and future hybridizations with frequency priors or Transformer components.

Abstract

Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.

LFMamba: Light Field Image Super-Resolution with State Space Model

TL;DR

This work introduces LFMamba, a pure State Space Model (SSM) based network for light field image super-resolution (LFSR) that operates on informative 2D LF slices (SAI, MacPI, EPI-H, EPI-V) to capture spatial, angular, and structural information with linear-time complexity. A core contribution is the efficient ESS2D-enabled S6 mechanism embedded in a two-stage basic SSM block, enabling strong long-range modeling with significantly reduced parameters. The architecture assembles Initial Feature Extraction, Spatial-Angular Feature Learning, and LF Structure Feature Learning to produce high-quality HR LF outputs, and demonstrates competitive results across five LF SR benchmarks, with notable angular consistency and successful extension to LF Angular SR. The findings suggest that LF representation learning via SSMs on 2D slices offers an effective and efficient alternative to CNN- or Transformer-based approaches, with potential for broader LF tasks and future hybridizations with frequency priors or Transformer components.

Abstract

Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.
Paper Structure (19 sections, 11 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 11 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Three different perspectives to model LFs using State Space Model. The key point is how to flatten a 4D LF into a 1D sequence. Left: Taking the 4D LF data as a whole and flatten it by different orders. Mid: Taking LF as an 3D image sequence to explore the relationships between sub-aperture images. Right: Taking LF as a combination of informative 2D data slices (i.e., sub-aperture image (SAI), macro-pixel image (MacPI), and epipolar plane image (EPI)) to fully capture spatial contextual information, complementary angular information, and structure information.
  • Figure 2: LFMamba. (a) The overall architecture of LFMamba. (b) The detailed structure of the core component, the basic SSM block. (c) The illustration of the proposed efficient S6.
  • Figure 3: Illustration of the original SS2D and our efficient SS2D.Up: Original SS2D in visual state sapce model liu2024vmamba copies the input four times for different scanning orders. Down: Our proposed efficient SS2D divides the input into four groups along the channel dimension for different scanning orders, which significantly reduces the parameters with little performance decline.
  • Figure 4: Visual comparisons of different methods for $4 \times$ LFSR. The first column shows the HR central view image and the rest columns present: 1) the close-ups of the super-resolved images by different methods, 2) the epipolar plane images, and 3) the PSNR and SSIM. Best viewed zoom-in.
  • Figure 5: Computational efficiency comparison between LFMamba and SOTA methods on $4\times$ SR. The area of circles denotes the memory consumption. The inference time is calculated by averaging the inference time of all scenes across the five test datasets.
  • ...and 3 more figures