Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning
Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong
TL;DR
This work presents MLFSR, a Mamba-based approach for light field super-resolution that achieves high accuracy with significantly reduced memory and latency compared to Transformer-based methods. By employing a bi-directional subspace scanning strategy, MLFSR packs global 4D information into manageable sequences and uses MGI to capture non-local correlations while SAM preserves locality. A Transformer-to-Mamba distillation loss (T2M) narrows the gap between attention-based and SSM-based modeling, boosting overall performance. Experiments on standard LF SR benchmarks show MLFSR outperforms CNN-based methods and competes with, or surpasses, Transformer-based methods, especially under full-resolution inference, enabling efficient high-resolution 4D LF processing with reduced resource demands.
Abstract
Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.
