Table of Contents
Fetching ...

Trajectory-aware Shifted State Space Models for Online Video Super-Resolution

Qiang Zhu, Xiandong Meng, Yuxian Jiang, Fan Zhang, David Bull, Shuyuan Zhu, Bing Zeng, Ronggang Wang

TL;DR

This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation.

Abstract

Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% complexity reduction (in MACs).

Trajectory-aware Shifted State Space Models for Online Video Super-Resolution

TL;DR

This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation.

Abstract

Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% complexity reduction (in MACs).

Paper Structure

This paper contains 23 sections, 13 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Comparison of existing online VSR methods with our TS-Mamba in terms of PSNR and MACs on the REDS4 dataset. Our TS-Mamba outperforms these SOTA methods and significantly reduces complexity in terms of MACs.
  • Figure 2: The architecture of the TS-Mamba network. Trajectories of videos are first generated and the similar tokens from previous frames are selected along trajectories. Then, the selected tokens alongside the current frame token are fed into the trajectory-aware shifted Mamba aggregation (TSMA) module to achieve the long-term spatio-temporal information aggregation.
  • Figure 3: Illustration of Hilbert scannings and shifted windows generated by seven procedures. (a) Four types of Hilbert scannings. (b) The procedure $\mathcal{P}(1,U(1),3)$, and elimination value $\delta$. (c) Shifted windows and elimination values $\delta$ for procedures $\mathcal{P}(1,UL(1/2/3)/UR(1/2/3),3)$, respectively.
  • Figure 3: Ablation study of selected token number $s$.
  • Figure 4: Visual results on BI degradation (REDS4, Vid4) and BD degradation (Vimeo-90K-T, Vid4).
  • ...and 9 more figures