Table of Contents
Fetching ...

Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model

Hyun-kyu Ko, Youbin Kim, Jihyeon Park, Dongheok Park, Gyeongjin Kang, Wonjun Cho, Hyung Yi, Eunbyung Park

TL;DR

This work introduces Gather-Scatter Mamba (GSMamba), a hybrid video super-resolution framework that integrates a Mamba-based selective scan with alignment-aware gather-scatter operations. By first aligning neighboring frames to a center anchor, flattening temporally, and applying Mamba, the model captures long-range temporal dependencies with linear complexity; a subsequent scatter step redistributes updated residuals to all frames within the window, enabling joint refinement. The approach also employs shifted window self-attention for robust spatial context within frames. Experiments on REDS, Vimeo-90K, and Vid4 demonstrate strong performance with fewer parameters and lower FLOPs than many baselines, and ablations validate the importance of center-anchored alignment and residual scattering for effective propagation and reconstruction.

Abstract

State Space Models (SSMs)-most notably RNNs-have historically played a central role in sequential modeling. Although attention mechanisms such as Transformers have since dominated due to their ability to model global context, their quadratic complexity and limited scalability make them less suited for long sequences. Video super-resolution (VSR) methods have traditionally relied on recurrent architectures to propagate features across frames. However, such approaches suffer from well-known issues including vanishing gradients, lack of parallelism, and slow inference speed. Recent advances in selective SSMs like Mamba offer a compelling alternative: by enabling input-dependent state transitions with linear-time complexity, Mamba mitigates these issues while maintaining strong long-range modeling capabilities. Despite this potential, Mamba alone struggles to capture fine-grained spatial dependencies due to its causal nature and lack of explicit context aggregation. To address this, we propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation. Furthermore, we introduce Gather-Scatter Mamba (GSM), an alignment-aware mechanism that warps features toward a center anchor frame within the temporal window before Mamba propagation and scatters them back afterward, effectively reducing occlusion artifacts and ensuring effective redistribution of aggregated information across all frames. The official implementation is provided at: https://github.com/Ko-Lani/GSMamba.

Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model

TL;DR

This work introduces Gather-Scatter Mamba (GSMamba), a hybrid video super-resolution framework that integrates a Mamba-based selective scan with alignment-aware gather-scatter operations. By first aligning neighboring frames to a center anchor, flattening temporally, and applying Mamba, the model captures long-range temporal dependencies with linear complexity; a subsequent scatter step redistributes updated residuals to all frames within the window, enabling joint refinement. The approach also employs shifted window self-attention for robust spatial context within frames. Experiments on REDS, Vimeo-90K, and Vid4 demonstrate strong performance with fewer parameters and lower FLOPs than many baselines, and ablations validate the importance of center-anchored alignment and residual scattering for effective propagation and reconstruction.

Abstract

State Space Models (SSMs)-most notably RNNs-have historically played a central role in sequential modeling. Although attention mechanisms such as Transformers have since dominated due to their ability to model global context, their quadratic complexity and limited scalability make them less suited for long sequences. Video super-resolution (VSR) methods have traditionally relied on recurrent architectures to propagate features across frames. However, such approaches suffer from well-known issues including vanishing gradients, lack of parallelism, and slow inference speed. Recent advances in selective SSMs like Mamba offer a compelling alternative: by enabling input-dependent state transitions with linear-time complexity, Mamba mitigates these issues while maintaining strong long-range modeling capabilities. Despite this potential, Mamba alone struggles to capture fine-grained spatial dependencies due to its causal nature and lack of explicit context aggregation. To address this, we propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation. Furthermore, we introduce Gather-Scatter Mamba (GSM), an alignment-aware mechanism that warps features toward a center anchor frame within the temporal window before Mamba propagation and scatters them back afterward, effectively reducing occlusion artifacts and ensuring effective redistribution of aggregated information across all frames. The official implementation is provided at: https://github.com/Ko-Lani/GSMamba.

Paper Structure

This paper contains 21 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overall architecture of the proposed Gather-Scatter Mamba (GSM). Given a low-resolution input sequence, local spatial refinement is first performed using shifted window self-attention (SWSA). Temporal propagation is then carried out by the window propagation module (WPM). Within each WPM, the GSM block first gathers features by aligning all frames to an anchor frame, processes the aligned features using Mamba's directionally selective scanning, and then scatters the updated features back to their original temporal locations.
  • Figure 2: Overview of the proposed Gather-Scatter mechanism. Misaligned trajectories across frames are first temporally aligned toward the anchor frame (align center) using optical flow (Gather). The aligned features are then flattened in temporal-first order and processed by SS2D (Mamba) for long-range temporal modeling. Finally, the output residuals are inversely warped back to their original frame positions (Scatter), updating all frames within the window.
  • Figure 3: Comparison of windowed propagation strategies. (A) Forward-anchored propagation: supporting frames S1, S2 are aligned toward the anchor A located at the end of the window, and only the anchor is updated. (B) Center-anchored propagation: supporting frames are aligned toward the anchor placed at the center of the window, reducing alignment path length and improving feature aggregation. (C) Center-anchored + Scatter (ours): residuals aligned at the center are redistributed back to all supporting frames, enabling joint enhancement of the entire window.
  • Figure 4: Occlusion comparison between forward-anchored and center-anchored approaches. The center-anchored strategy leverages information from adjacent frames, leading to fewer occluded regions. This reduction in occlusion enables more reliable reconstruction of the anchor frame and improves alignment quality.
  • Figure 5: Qualitative results with the state-of-the-art methods on REDS4 nah2019ntire dataset
  • ...and 1 more figures