Table of Contents
Fetching ...

Motion Graph Unleashed: A Novel Approach to Video Prediction

Yiqi Zhong, Luming Liang, Bohan Tang, Ilya Zharkov, Ulrich Neumann

TL;DR

The motion graph transforms patches of video frames into interconnected graph nodes, to comprehensively describe the spatial-temporal relationships among them, and presents a video prediction pipeline empowered by motion graph, exhibiting substantial performance improvements and cost reductions.

Abstract

We introduce motion graph, a novel approach to the video prediction problem, which predicts future video frames from limited past data. The motion graph transforms patches of video frames into interconnected graph nodes, to comprehensively describe the spatial-temporal relationships among them. This representation overcomes the limitations of existing motion representations such as image differences, optical flow, and motion matrix that either fall short in capturing complex motion patterns or suffer from excessive memory consumption. We further present a video prediction pipeline empowered by motion graph, exhibiting substantial performance improvements and cost reductions. Experiments on various datasets, including UCF Sports, KITTI and Cityscapes, highlight the strong representative ability of motion graph. Especially on UCF Sports, our method matches and outperforms the SOTA methods with a significant reduction in model size by 78% and a substantial decrease in GPU memory utilization by 47%.

Motion Graph Unleashed: A Novel Approach to Video Prediction

TL;DR

The motion graph transforms patches of video frames into interconnected graph nodes, to comprehensively describe the spatial-temporal relationships among them, and presents a video prediction pipeline empowered by motion graph, exhibiting substantial performance improvements and cost reductions.

Abstract

We introduce motion graph, a novel approach to the video prediction problem, which predicts future video frames from limited past data. The motion graph transforms patches of video frames into interconnected graph nodes, to comprehensively describe the spatial-temporal relationships among them. This representation overcomes the limitations of existing motion representations such as image differences, optical flow, and motion matrix that either fall short in capturing complex motion patterns or suffer from excessive memory consumption. We further present a video prediction pipeline empowered by motion graph, exhibiting substantial performance improvements and cost reductions. Experiments on various datasets, including UCF Sports, KITTI and Cityscapes, highlight the strong representative ability of motion graph. Especially on UCF Sports, our method matches and outperforms the SOTA methods with a significant reduction in model size by 78% and a substantial decrease in GPU memory utilization by 47%.

Paper Structure

This paper contains 25 sections, 8 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: (A) Hard cases which cannot be properly modeled by most existing representations. (B) Motion graph transforms single-frame patches into interconnected nodes, describing the spatial-temporal relationships. Future per-pixel motion dynamic vectors are then predicted on this graph.
  • Figure 2: Motion graph node construction: Cosine similarity, denoted by $(,)$, between patch features in consecutive frames is computed to further choose top k directions for each patch. Tendency $\mathbf{v}^{tf^{(m)}}_{i}$ and location features $\mathbf{v}^{lf^{(m)}}_{i}$ are then generated based on these k vectors and the patch location.
  • Figure 3: Inside the interaction module for the $m$-th view $\Phi^{(m)}$. The spatial and temporal message passing are iteratively conducted and repeated $T-1$ times, where $T$ is the observed frame number.
  • Figure 4: Pipeline overview. After decoding per-pixel motion features into dynamic vectors, we perform multi-flow forward warping for future frame generation.
  • Figure 5: On the UCF Sports dataset, our method recovers richer image details than MMVP zhong2023mmvp.
  • ...and 7 more figures