PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

ZiDong Wang; Zeyu Lu; Di Huang; Tong He; Xihui Liu; Wanli Ouyang; Lei Bai

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai

TL;DR

PredBench addresses the fragmentation in spatio-temporal prediction by providing a standardized, cross-domain benchmark that integrates 12 established STP methods and 15 diverse datasets. It introduces a four-dimensional evaluation framework—short-term, long-term extrapolation, generalization, and temporal robustness—paired with domain-tailored metrics to enable fair, thorough comparisons. Key findings include that no single method dominates across all tasks; transformer-based Earthformer and diffusion-based MCVD show strengths in weather-related datasets and perceptual metrics, while models like PredRNN++ excel in motion-trajectory predictions. The work delivers an open-source codebase to foster reproducibility and guide future STP research and applications.

Abstract

In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/OpenEarthLab/PredBench.

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

TL;DR

Abstract

Paper Structure (25 sections, 9 equations, 22 figures, 31 tables)

This paper contains 25 sections, 9 equations, 22 figures, 31 tables.

Introduction
Related Work
PredBench
Supported Methods and Datasets
Evaluation Metrics
Standardized Experimental Protocol
Multi-dimensional Evaluations
Experiments
Short-Term Prediction Analysis
Long-Term Prediction Analysis
Generalization Ability Analysis
Robustness Analysis
Conclusion
Standard Experimental Protocol Details
Detailed Evaluation Metrics
...and 10 more sections

Figures (22)

Figure 1: Overview of our spatio-temporal Prediction Benchmark (PredBench). It conducts a thorough 4-dimensional evaluation of 12 prevalent spatio-temporal prediction methods, spanning 5 distinct domains and covering 15 diverse datasets.
Figure 1: Overview of our unified codebase.
Figure 2: We support 12 methods and 15 datasets in our PredBench. The gray cells represent the settings in which previous methods have been conducted. We fill the remaining blank cells by conducting large-scale experiments and thorough evaluation. The green ticks indicate that short-term prediction experiments are conducted, while orange ticks signify the implementation of long-term prediction experiments. The blue ticks represent the execution of generalization experiments, and purple ticks denote experiments in temporal resolution robustness.
Figure 2: Qualitative results on BAIRebert2017bair (2 frames $\longrightarrow$ 10 frames).
Figure 3: The visualization results of MCVD, PredRNN++, TAU and groudtruth on BAIR (left) and RoboNet (right). The yellow numbers represent frame indices. Areas where TAU and PredRNN++ exhibit ghosting are highlighted with red boxes. It can be observed that the output of MCVD is notably clear.
...and 17 more figures

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

TL;DR

Abstract

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

Authors

TL;DR

Abstract

Table of Contents

Figures (22)