LFMamba: Light Field Image Super-Resolution with State Space Model
Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou
TL;DR
This work introduces LFMamba, a pure State Space Model (SSM) based network for light field image super-resolution (LFSR) that operates on informative 2D LF slices (SAI, MacPI, EPI-H, EPI-V) to capture spatial, angular, and structural information with linear-time complexity. A core contribution is the efficient ESS2D-enabled S6 mechanism embedded in a two-stage basic SSM block, enabling strong long-range modeling with significantly reduced parameters. The architecture assembles Initial Feature Extraction, Spatial-Angular Feature Learning, and LF Structure Feature Learning to produce high-quality HR LF outputs, and demonstrates competitive results across five LF SR benchmarks, with notable angular consistency and successful extension to LF Angular SR. The findings suggest that LF representation learning via SSMs on 2D slices offers an effective and efficient alternative to CNN- or Transformer-based approaches, with potential for broader LF tasks and future hybridizations with frequency priors or Transformer components.
Abstract
Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.
