Table of Contents
Fetching ...

Burst Image Super-Resolution with Mamba

Ozan Unal, Steven Marty, Dengxin Dai

TL;DR

BurstMamba addresses the inefficiency of burst image super-resolution by decoupling keyframe SR from burst-based subpixel priors using a Mamba-based backbone with linear time complexity. The method introduces two innovations: Optical Flow-Based Serialization (OFS), which aligns burst information only during inter-frame state updates to preserve high-frequency details, and a Wavelet-based State-Space Update (ψS6) that prioritizes high-frequency features for burst-to-keyframe information transfer. Empirical results on SyntheticSR, RealBSR-RGB, and RealBSR-RAW show state-of-the-art performance, with ablations confirming substantial gains from the temporal subpixel prior, OFS, and ψS6, and demonstrated robustness to varying burst lengths. The approach offers scalable deployment flexibility and potential applicability to related burst enhancement tasks such as deblurring and denoising.

Abstract

Burst image super-resolution (BISR) aims to enhance the resolution of a keyframe by leveraging information from multiple low-resolution images captured in quick succession. In the deep learning era, BISR methods have evolved from fully convolutional networks to transformer-based architectures, which, despite their effectiveness, suffer from the quadratic complexity of self-attention. We see Mamba as the next natural step in the evolution of this field, offering a comparable global receptive field and selective information routing with only linear time complexity. In this work, we introduce BurstMamba, a Mamba-based architecture for BISR. Our approach decouples the task into two specialized branches: a spatial module for keyframe super-resolution and a temporal module for subpixel prior extraction, striking a balance between computational efficiency and burst information integration. To further enhance burst processing with Mamba, we propose two novel strategies: (i) optical flow-based serialization, which aligns burst sequences only during state updates to preserve subpixel details, and (ii) a wavelet-based reparameterization of the state-space update rules, prioritizing high-frequency features for improved burst-to-keyframe information passing. Our framework achieves SOTA performance on public benchmarks of SyntheticSR, RealBSR-RGB, and RealBSR-RAW.

Burst Image Super-Resolution with Mamba

TL;DR

BurstMamba addresses the inefficiency of burst image super-resolution by decoupling keyframe SR from burst-based subpixel priors using a Mamba-based backbone with linear time complexity. The method introduces two innovations: Optical Flow-Based Serialization (OFS), which aligns burst information only during inter-frame state updates to preserve high-frequency details, and a Wavelet-based State-Space Update (ψS6) that prioritizes high-frequency features for burst-to-keyframe information transfer. Empirical results on SyntheticSR, RealBSR-RGB, and RealBSR-RAW show state-of-the-art performance, with ablations confirming substantial gains from the temporal subpixel prior, OFS, and ψS6, and demonstrated robustness to varying burst lengths. The approach offers scalable deployment flexibility and potential applicability to related burst enhancement tasks such as deblurring and denoising.

Abstract

Burst image super-resolution (BISR) aims to enhance the resolution of a keyframe by leveraging information from multiple low-resolution images captured in quick succession. In the deep learning era, BISR methods have evolved from fully convolutional networks to transformer-based architectures, which, despite their effectiveness, suffer from the quadratic complexity of self-attention. We see Mamba as the next natural step in the evolution of this field, offering a comparable global receptive field and selective information routing with only linear time complexity. In this work, we introduce BurstMamba, a Mamba-based architecture for BISR. Our approach decouples the task into two specialized branches: a spatial module for keyframe super-resolution and a temporal module for subpixel prior extraction, striking a balance between computational efficiency and burst information integration. To further enhance burst processing with Mamba, we propose two novel strategies: (i) optical flow-based serialization, which aligns burst sequences only during state updates to preserve subpixel details, and (ii) a wavelet-based reparameterization of the state-space update rules, prioritizing high-frequency features for improved burst-to-keyframe information passing. Our framework achieves SOTA performance on public benchmarks of SyntheticSR, RealBSR-RGB, and RealBSR-RAW.

Paper Structure

This paper contains 19 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: BurstMamba decouples the processing of the keyframe image for sisr, with the processing of the burst sequence for subpixel prior extraction. By design, the temporal module is invariant to sequence length, thus BurstMamba can adapt to varying burst lengths after deployment.
  • Figure 2: Illustration of the BurstMamba pipeline. BurstMamba takes a RAW or RGB burst sequence as input and super-resolves the keyframe (often the first image of the sequence). The model consists of two key modules: (purple) spatial to process only the keyframe for single image super-resolution, (green) wavelet-based temporal to feed subpixel priors from the burst sequence into the spatial module.
  • Figure 3: Illustration of the ofs with bilinear alignment. allows the model to preserve the input structure and prevents smoothing subpixel features when processing individual frames but aligns images for improved image-to-image message passing within the temporal state-space blocks.
  • Figure 4: Qualitative comparison of different methods on the RealBSR-RGB dataset for $\times4$ burst image super-resolution.
  • Figure 5: Qualitative results from varying the input burst sequence length (L) for BurstMamba on the RealBSR-RGB dataset. In the top row we illustrate the benefit of increasing the burst length when facing a scene dominated by high frequency details. In the bottom row, we show that single image super-resolution can provide a sufficiently good result when processing a scene with simple structures. Additionally, we isolate the contribution of the temporal module by showing the difference of each prediction to the decoupled single image prediction.
  • ...and 3 more figures