Table of Contents
Fetching ...

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong

TL;DR

This work presents MLFSR, a Mamba-based approach for light field super-resolution that achieves high accuracy with significantly reduced memory and latency compared to Transformer-based methods. By employing a bi-directional subspace scanning strategy, MLFSR packs global 4D information into manageable sequences and uses MGI to capture non-local correlations while SAM preserves locality. A Transformer-to-Mamba distillation loss (T2M) narrows the gap between attention-based and SSM-based modeling, boosting overall performance. Experiments on standard LF SR benchmarks show MLFSR outperforms CNN-based methods and competes with, or surpasses, Transformer-based methods, especially under full-resolution inference, enabling efficient high-resolution 4D LF processing with reduced resource demands.

Abstract

Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

TL;DR

This work presents MLFSR, a Mamba-based approach for light field super-resolution that achieves high accuracy with significantly reduced memory and latency compared to Transformer-based methods. By employing a bi-directional subspace scanning strategy, MLFSR packs global 4D information into manageable sequences and uses MGI to capture non-local correlations while SAM preserves locality. A Transformer-to-Mamba distillation loss (T2M) narrows the gap between attention-based and SSM-based modeling, boosting overall performance. Experiments on standard LF SR benchmarks show MLFSR outperforms CNN-based methods and competes with, or surpasses, Transformer-based methods, especially under full-resolution inference, enabling efficient high-resolution 4D LF processing with reduced resource demands.

Abstract

Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.

Paper Structure

This paper contains 13 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Runtime (ms) and PSNR (dB) comparison. The runtime is calculated on an input LF of size 5$\times$5$\times$32$\times$32, * denotes full-resolution inference. MLFSR outperforms state-of-the-art CNN-based methods and has competitive performance compared to Transformer-based methods with less runtime and parameters. Full-resolution inference gives a further performance boost on available methods. Comparisons are performed for 4$\times$ SR on the EPFL dataset.
  • Figure 2: A toy example ($U=V=2,H=W=2$) on LF tokenization and scanning directions. (a) Sub-aperture images. (b) Whole 4D sequence with quad-directional scanning (SAI for example). (c) Subspace sequences with bi-directional scanning. Bi-directional spatial (red arrows) and angular (green arrows) scanning are complemented by bi-directional EPI (blue and purple arrows) scanning.
  • Figure 3: Overview of MLFSR. Initial features $f_{init}$ are first extracted by encoder $\mathcal{N}_{Init}$, followed by alternate MGI modules and SAM to extract deep features $f_{deep}$. Finally, we obtain super-resolved results $I_{SR}$ through the reconstruction module $\mathcal{N}_{Rec}$.
  • Figure 4: (a) The detailed structure of SA-Mamba/EPI-Mamba. Each SA-Mamba/EPI-Mamba includes two Bidirectional Subspace Scanning (BiSS) blocks. We omit the reshape operation for simplicity. (b) The BiSS block follows a typical Transformer-style design, including bidirectional scanning for token interaction and channel attention for channel mixing. (c) The details of bidirectional scanning.
  • Figure 5: The detailed structure of Spatial-Angular Modulator (SAM).
  • ...and 2 more figures