Table of Contents
Fetching ...

Evaluating the Generalization Ability of Spatiotemporal Model in Urban Scenario

Hongjun Wang, Jiyuan Chen, Tong Pan, Zheng Dong, Lingyu Zhang, Renhe Jiang, Xuan Song

TL;DR

A proposed Spatiotemporal Out-of-Distribution benchmark, which comprises six urban scenario: bike-sharing, 311 services, pedestrian counts, traffic speed, traffic flow, ride-hailing demand, and bike-sharing, each with in-distribution and out-of-distribution settings, and shows that their performance degrades significantly in out-of-distribution settings.

Abstract

Spatiotemporal neural networks have shown great promise in urban scenarios by effectively capturing temporal and spatial correlations. However, urban environments are constantly evolving, and current model evaluations are often limited to traffic scenarios and use data mainly collected only a few weeks after training period to evaluate model performance. The generalization ability of these models remains largely unexplored. To address this, we propose a Spatiotemporal Out-of-Distribution (ST-OOD) benchmark, which comprises six urban scenario: bike-sharing, 311 services, pedestrian counts, traffic speed, traffic flow, ride-hailing demand, and bike-sharing, each with in-distribution (same year) and out-of-distribution (next years) settings. We extensively evaluate state-of-the-art spatiotemporal models and find that their performance degrades significantly in out-of-distribution settings, with most models performing even worse than a simple Multi-Layer Perceptron (MLP). Our findings suggest that current leading methods tend to over-rely on parameters to overfit training data, which may lead to good performance on in-distribution data but often results in poor generalization. We also investigated whether dropout could mitigate the negative effects of overfitting. Our results showed that a slight dropout rate could significantly improve generalization performance on most datasets, with minimal impact on in-distribution performance. However, balancing in-distribution and out-of-distribution performance remains a challenging problem. We hope that the proposed benchmark will encourage further research on this critical issue.

Evaluating the Generalization Ability of Spatiotemporal Model in Urban Scenario

TL;DR

A proposed Spatiotemporal Out-of-Distribution benchmark, which comprises six urban scenario: bike-sharing, 311 services, pedestrian counts, traffic speed, traffic flow, ride-hailing demand, and bike-sharing, each with in-distribution and out-of-distribution settings, and shows that their performance degrades significantly in out-of-distribution settings.

Abstract

Spatiotemporal neural networks have shown great promise in urban scenarios by effectively capturing temporal and spatial correlations. However, urban environments are constantly evolving, and current model evaluations are often limited to traffic scenarios and use data mainly collected only a few weeks after training period to evaluate model performance. The generalization ability of these models remains largely unexplored. To address this, we propose a Spatiotemporal Out-of-Distribution (ST-OOD) benchmark, which comprises six urban scenario: bike-sharing, 311 services, pedestrian counts, traffic speed, traffic flow, ride-hailing demand, and bike-sharing, each with in-distribution (same year) and out-of-distribution (next years) settings. We extensively evaluate state-of-the-art spatiotemporal models and find that their performance degrades significantly in out-of-distribution settings, with most models performing even worse than a simple Multi-Layer Perceptron (MLP). Our findings suggest that current leading methods tend to over-rely on parameters to overfit training data, which may lead to good performance on in-distribution data but often results in poor generalization. We also investigated whether dropout could mitigate the negative effects of overfitting. Our results showed that a slight dropout rate could significantly improve generalization performance on most datasets, with minimal impact on in-distribution performance. However, balancing in-distribution and out-of-distribution performance remains a challenging problem. We hope that the proposed benchmark will encourage further research on this critical issue.
Paper Structure (12 sections, 2 equations, 5 figures, 3 tables)

This paper contains 12 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: One example of a spatiotemporal shift is the dynamic evolution of urban areas. As new roads or points of interest continuously emerge and older ones are gradually removed, new traffic demands and spatiotemporal relationships are created. However, current spatiotemporal models have not yet been tested under such evolving conditions.
  • Figure 2: Map visualization of node distribution.
  • Figure 3: Compare the DTW and Kendall's $\tau$ distances for the pedestrian Zurich data and the 311 service data from New York over the same year and across different years.
  • Figure 4: We applied varying proportions of dropout to the node embedding in STID to mitigate the effect of inductive biases, observing its impact on both in and out-of-distribution performance.
  • Figure 5: We applied varying proportions of dropout to the node embedding in GWNet to mitigate the effect of inductive biases, observing its impact on both in and out-of-distribution performance.