Table of Contents
Fetching ...

Assessment of Spatio-Temporal Predictors in the Presence of Missing and Heterogeneous Data

Daniele Zambon, Cesare Alippi

TL;DR

The paper tackles the challenge of evaluating spatio-temporal predictive models when data are missing or heterogeneous. It proposes AZ-analysis, a residual-correlation framework built on the AZ-whiteness test, to detect and localize residual dependencies on a multiplex spatio-temporal graph. Key contributions include a refined correlation scoring function, node/time/subgraph localization methods, and demonstrations on synthetic data plus real-world traffic and energy-prediction tasks. The approach provides a robust, distribution-free diagnostic tool that complements traditional error metrics, enabling targeted model improvements in practical deployments with minimal assumptions.

Abstract

Deep learning approaches achieve outstanding predictive performance in modeling modern data, despite the increasing complexity and scale. However, evaluating the quality of predictive models becomes more challenging, as traditional statistical assumptions often no longer hold. In particular, spatio-temporal data exhibit dependencies across both time and space, often involving nonlinear dynamics, non-stationarities, and missing observations. As a result, advanced predictors such as spatio-temporal graph neural networks require novel evaluation methodologies. This paper introduces a residual correlation analysis framework designed to assess the optimality of spatio-temporal predictive neural models, particularly in scenarios with incomplete and heterogeneous data. By leveraging the principle that residual correlation indicates information not captured by the model, this framework serves as a powerful tool to identify and localize regions in space and time where model performance can be improved. A key advantage of the proposed approach is its ability to operate under minimal assumptions, enabling robust evaluation of deep learning models applied to multivariate time series, even in the presence of missing and heterogeneous data. The methodology employs tailored spatio-temporal graphs to encode sparse spatial and temporal dependencies within the data and utilizes asymptotically distribution-free summary statistics to pinpoint time intervals and spatial regions where the model underperforms. The effectiveness of the proposed residual analysis is demonstrated through validation on both synthetic and real-world scenarios involving state-of-the-art predictive models.

Assessment of Spatio-Temporal Predictors in the Presence of Missing and Heterogeneous Data

TL;DR

The paper tackles the challenge of evaluating spatio-temporal predictive models when data are missing or heterogeneous. It proposes AZ-analysis, a residual-correlation framework built on the AZ-whiteness test, to detect and localize residual dependencies on a multiplex spatio-temporal graph. Key contributions include a refined correlation scoring function, node/time/subgraph localization methods, and demonstrations on synthetic data plus real-world traffic and energy-prediction tasks. The approach provides a robust, distribution-free diagnostic tool that complements traditional error metrics, enabling targeted model improvements in practical deployments with minimal assumptions.

Abstract

Deep learning approaches achieve outstanding predictive performance in modeling modern data, despite the increasing complexity and scale. However, evaluating the quality of predictive models becomes more challenging, as traditional statistical assumptions often no longer hold. In particular, spatio-temporal data exhibit dependencies across both time and space, often involving nonlinear dynamics, non-stationarities, and missing observations. As a result, advanced predictors such as spatio-temporal graph neural networks require novel evaluation methodologies. This paper introduces a residual correlation analysis framework designed to assess the optimality of spatio-temporal predictive neural models, particularly in scenarios with incomplete and heterogeneous data. By leveraging the principle that residual correlation indicates information not captured by the model, this framework serves as a powerful tool to identify and localize regions in space and time where model performance can be improved. A key advantage of the proposed approach is its ability to operate under minimal assumptions, enabling robust evaluation of deep learning models applied to multivariate time series, even in the presence of missing and heterogeneous data. The methodology employs tailored spatio-temporal graphs to encode sparse spatial and temporal dependencies within the data and utilizes asymptotically distribution-free summary statistics to pinpoint time intervals and spatial regions where the model underperforms. The effectiveness of the proposed residual analysis is demonstrated through validation on both synthetic and real-world scenarios involving state-of-the-art predictive models.
Paper Structure (29 sections, 1 theorem, 32 equations, 11 figures)

This paper contains 29 sections, 1 theorem, 32 equations, 11 figures.

Key Result

Theorem 1

Consider a spatio-temporal graph $\boldsymbol{\mathbf{g}}^*$ with associated stochastic residuals $\boldsymbol{\mathbf{r}}$ and the hyperparameter $\lambda\in[0,1]$. Assume then, the distribution of $C_\lambda(\boldsymbol{\mathbf{g}}^*)$ in eq:az-test-statistic converges weakly to a standard Gaussian distribution $\mathcal{N}(0, 1)$ as the number $|E^*|$ of edges goes to infinity.

Figures (11)

  • Figure 1: Representation of spatio-temporal data $\boldsymbol{\mathbf{x}}$ as a set of time series with associated sequence of graphs $(\boldsymbol{\mathbf{g}}_1,\boldsymbol{\mathbf{g}}_2,\dots,\boldsymbol{\mathbf{g}}_T)$ encoding functional relations. Observation $\boldsymbol{\mathbf{x}}_{\tau,v}$ at time step $\tau$ and node/sensor $v$ is multivariate. Nodes need not be available at all times (light gray boxes) and the graph topology can vary. $w_{t,e}$ denotes the weight of edge $e$ at time step $t$.
  • Figure 2: A section view of the spatio-temporal graph $\boldsymbol{\mathbf{g}}^*$ from Sec. \ref{['sec:multiplex']}. Each node is associated with a residual vector $\boldsymbol{\mathbf{r}}_{t,v}$ and each edge -- either spatial (red) or temporal (blue) -- is associated with a sign \ref{['eq:edge-sign']}.
  • Figure 3: The figure compares the value of of statistics $C_\lambda(\cdot)$ defined in \ref{['eq:az-test-statistic']} (left) with that of the correlation scores $c_\lambda(\cdot)$ of \ref{['eq:az-max-adjusted']} (right). Different levels of residual correlation and number $|E|$ of graph edges are considered. Each color corresponds to a different number of edges. Solid lines represent the expected value of the score estimated over 100 repeated experiments, dashed lines the standard deviation, shaded area the interquartile interval.
  • Figure 4: Top) Generic subgraphs $\boldsymbol{\mathbf{s}}_v,\boldsymbol{\mathbf{s}}_u$ involved in the computation of node scores $c_\lambda(v)$ and $c_\lambda(u)$. Center) Generic subgraphs $\boldsymbol{\mathbf{s}}_t,\boldsymbol{\mathbf{s}}_\tau$ involved in the computation of temporal scores $c_\lambda(t)$ and $c_\lambda(\tau)$. Bottom) Generic subgraphs $\boldsymbol{\mathbf{s}}_{t,v},\boldsymbol{\mathbf{s}}_{\tau,u}$ involved in the computation of temporal scores $c_\lambda(v,t)$ and $c_\lambda(\tau,u)$.
  • Figure 5: Scores involved in the analysis of residuals on synthetic data. Scores associated with $\lambda=0$, $1/2$, and $1$ are depicted in blue, green, and red colors, respectively. Top left) Temporal scores $c_\lambda(t)$; a moving average is applied to improve readability. Center) Node-level scores $c_\lambda(v)$ as both line plots and heatmaps on the graph. Bottom left) Local spatio-temporal score $c_\lambda(t,v)$ for $\lambda=1/2$; see Figure \ref{['fig:synth-extra-data']} for $\lambda=0$ and $1$. Red and blue boxes (sets $A$ and $B$) highlight regions with spatial and temporal correlation, respectively. Values of the AZ-test statistics are reported at the top left of the figure.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem 1: zambon2022aztest