Assessment of Spatio-Temporal Predictors in the Presence of Missing and Heterogeneous Data
Daniele Zambon, Cesare Alippi
TL;DR
The paper tackles the challenge of evaluating spatio-temporal predictive models when data are missing or heterogeneous. It proposes AZ-analysis, a residual-correlation framework built on the AZ-whiteness test, to detect and localize residual dependencies on a multiplex spatio-temporal graph. Key contributions include a refined correlation scoring function, node/time/subgraph localization methods, and demonstrations on synthetic data plus real-world traffic and energy-prediction tasks. The approach provides a robust, distribution-free diagnostic tool that complements traditional error metrics, enabling targeted model improvements in practical deployments with minimal assumptions.
Abstract
Deep learning approaches achieve outstanding predictive performance in modeling modern data, despite the increasing complexity and scale. However, evaluating the quality of predictive models becomes more challenging, as traditional statistical assumptions often no longer hold. In particular, spatio-temporal data exhibit dependencies across both time and space, often involving nonlinear dynamics, non-stationarities, and missing observations. As a result, advanced predictors such as spatio-temporal graph neural networks require novel evaluation methodologies. This paper introduces a residual correlation analysis framework designed to assess the optimality of spatio-temporal predictive neural models, particularly in scenarios with incomplete and heterogeneous data. By leveraging the principle that residual correlation indicates information not captured by the model, this framework serves as a powerful tool to identify and localize regions in space and time where model performance can be improved. A key advantage of the proposed approach is its ability to operate under minimal assumptions, enabling robust evaluation of deep learning models applied to multivariate time series, even in the presence of missing and heterogeneous data. The methodology employs tailored spatio-temporal graphs to encode sparse spatial and temporal dependencies within the data and utilizes asymptotically distribution-free summary statistics to pinpoint time intervals and spatial regions where the model underperforms. The effectiveness of the proposed residual analysis is demonstrated through validation on both synthetic and real-world scenarios involving state-of-the-art predictive models.
