Table of Contents
Fetching ...

GMG: A Video Prediction Method Based on Global Focus and Motion Guided

Yuhao Du, Hui Liu, Haoxiang Peng, Xinyuan Cheng, Chenrong Wu, Jiankai Zhang

TL;DR

GMG tackles two core challenges in spatiotemporal weather prediction: long-range correlations and non-rigid motion deformation. It introduces a Global Focus Module to enlarge the receptive field and a Motion Guided Module to model deformation with learnable balance and decay dynamics, integrated within a four-unit GMG framework built on ST-ConvLSTM and Self-Attention Memory. Across five diverse datasets, GMG achieves state-of-the-art or competitive performance on standard metrics, demonstrating strong generalization to both rainfall and traffic-like data as well as moving MNIST. The results suggest GMG provides a general, effective approach for complex video prediction tasks with practical implications for meteorological forecasting and related domains.

Abstract

Recent years, weather forecasting has gained significant attention. However, accurately predicting weather remains a challenge due to the rapid variability of meteorological data and potential teleconnections. Current spatiotemporal forecasting models primarily rely on convolution operations or sliding windows for feature extraction. These methods are limited by the size of the convolutional kernel or sliding window, making it difficult to capture and identify potential teleconnection features in meteorological data. Additionally, weather data often involve non-rigid bodies, whose motion processes are accompanied by unpredictable deformations, further complicating the forecasting task. In this paper, we propose the GMG model to address these two core challenges. The Global Focus Module, a key component of our model, enhances the global receptive field, while the Motion Guided Module adapts to the growth or dissipation processes of non-rigid bodies. Through extensive evaluations, our method demonstrates competitive performance across various complex tasks, providing a novel approach to improving the predictive accuracy of complex spatiotemporal data.

GMG: A Video Prediction Method Based on Global Focus and Motion Guided

TL;DR

GMG tackles two core challenges in spatiotemporal weather prediction: long-range correlations and non-rigid motion deformation. It introduces a Global Focus Module to enlarge the receptive field and a Motion Guided Module to model deformation with learnable balance and decay dynamics, integrated within a four-unit GMG framework built on ST-ConvLSTM and Self-Attention Memory. Across five diverse datasets, GMG achieves state-of-the-art or competitive performance on standard metrics, demonstrating strong generalization to both rainfall and traffic-like data as well as moving MNIST. The results suggest GMG provides a general, effective approach for complex video prediction tasks with practical implications for meteorological forecasting and related domains.

Abstract

Recent years, weather forecasting has gained significant attention. However, accurately predicting weather remains a challenge due to the rapid variability of meteorological data and potential teleconnections. Current spatiotemporal forecasting models primarily rely on convolution operations or sliding windows for feature extraction. These methods are limited by the size of the convolutional kernel or sliding window, making it difficult to capture and identify potential teleconnection features in meteorological data. Additionally, weather data often involve non-rigid bodies, whose motion processes are accompanied by unpredictable deformations, further complicating the forecasting task. In this paper, we propose the GMG model to address these two core challenges. The Global Focus Module, a key component of our model, enhances the global receptive field, while the Motion Guided Module adapts to the growth or dissipation processes of non-rigid bodies. Through extensive evaluations, our method demonstrates competitive performance across various complex tasks, providing a novel approach to improving the predictive accuracy of complex spatiotemporal data.

Paper Structure

This paper contains 12 sections, 35 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The two main real-world challenges addressed in this paper. Position and shape changes of rainfall regions over time: For example, in region B, while moving eastward, the north-south extent of the rainfall gradually expands, adding complexity to prediction tasks. Long-range correlations: In real-life scenarios, a traffic jam in region C may impact traffic in another part of the city, such as region D. These long-range correlations are often difficult to capture, limiting the accuracy of predictions for such data.
  • Figure 2: The overall structure of GMG (left) and the two main modules proposed in this paper (right) are illustrated. A standard GMG unit consists of four modules: ST-ConvLSTM, Global Focus Module, Self-Attention Memory, and Motion Guided Module. In this study, each time step is composed of four stacked GMG units. The temporal memory $M$ from the fourth layer at time $t$ is transferred to the first layer at time $t+1$, ensuring that the model captures long-term temporal dynamics effectively. The "Time Delay" in the figure refers to the operation corresponding to Eq.(\ref{['eq:fifteen']}) in the text.
  • Figure 3: Diagram of GMG and Its Variants' Architecture
  • Figure 4: Visualizations on CIKM2017 (upper) and Shanghai2020 (lower). The time labeled in the figure represents the input data time/predicted time.
  • Figure 5: Visualizations on Taxibj, Error = $\left| \text{Prediction} - \text{Target} \right|$ , we amplify the error for better comparison.
  • ...and 1 more figures