Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model
Hyun-kyu Ko, Youbin Kim, Jihyeon Park, Dongheok Park, Gyeongjin Kang, Wonjun Cho, Hyung Yi, Eunbyung Park
TL;DR
This work introduces Gather-Scatter Mamba (GSMamba), a hybrid video super-resolution framework that integrates a Mamba-based selective scan with alignment-aware gather-scatter operations. By first aligning neighboring frames to a center anchor, flattening temporally, and applying Mamba, the model captures long-range temporal dependencies with linear complexity; a subsequent scatter step redistributes updated residuals to all frames within the window, enabling joint refinement. The approach also employs shifted window self-attention for robust spatial context within frames. Experiments on REDS, Vimeo-90K, and Vid4 demonstrate strong performance with fewer parameters and lower FLOPs than many baselines, and ablations validate the importance of center-anchored alignment and residual scattering for effective propagation and reconstruction.
Abstract
State Space Models (SSMs)-most notably RNNs-have historically played a central role in sequential modeling. Although attention mechanisms such as Transformers have since dominated due to their ability to model global context, their quadratic complexity and limited scalability make them less suited for long sequences. Video super-resolution (VSR) methods have traditionally relied on recurrent architectures to propagate features across frames. However, such approaches suffer from well-known issues including vanishing gradients, lack of parallelism, and slow inference speed. Recent advances in selective SSMs like Mamba offer a compelling alternative: by enabling input-dependent state transitions with linear-time complexity, Mamba mitigates these issues while maintaining strong long-range modeling capabilities. Despite this potential, Mamba alone struggles to capture fine-grained spatial dependencies due to its causal nature and lack of explicit context aggregation. To address this, we propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation. Furthermore, we introduce Gather-Scatter Mamba (GSM), an alignment-aware mechanism that warps features toward a center anchor frame within the temporal window before Mamba propagation and scatters them back afterward, effectively reducing occlusion artifacts and ensuring effective redistribution of aggregated information across all frames. The official implementation is provided at: https://github.com/Ko-Lani/GSMamba.
