Table of Contents
Fetching ...

Graph Structure Learning for Spatial-Temporal Imputation: Adapting to Node and Feature Scales

Xinyu Yang, Yu Sun, Xinyang Chen, Ying Zhang, Xiaojie Yuan

TL;DR

This work tackles missing data in spatial-temporal sensors by introducing GSLI, a multi-scale graph structure learning framework. It jointly learns node-scale graphs per feature and a feature-scale graph across features, incorporating prominence modeling to weight influential nodes/features. Through cross-feature and cross-temporal representations, GSLI captures rich spatial-temporal dependencies and demonstrates consistent improvements over diverse real datasets across multiple missing-data scenarios. The approach yields robust imputations and shows favorable downstream forecasting performance, with comprehensive analysis of complexity, ablations, and resource use. Overall, GSLI offers a principled, adaptable mechanism to handle feature heterogeneity and varying spatial relationships in spatial-temporal imputation tasks.

Abstract

Spatial-temporal data collected across different geographic locations often suffer from missing values, posing challenges to data analysis. Existing methods primarily leverage fixed spatial graphs to impute missing values, which implicitly assume that the spatial relationship is roughly the same for all features across different locations. However, they may overlook the different spatial relationships of diverse features recorded by sensors in different locations. To address this, we introduce the multi-scale Graph Structure Learning framework for spatial-temporal Imputation (GSLI) that dynamically adapts to the heterogeneous spatial correlations. Our framework encompasses node-scale graph structure learning to cater to the distinct global spatial correlations of different features, and feature-scale graph structure learning to unveil common spatial correlation across features within all stations. Integrated with prominence modeling, our framework emphasizes nodes and features with greater significance in the imputation process. Furthermore, GSLI incorporates cross-feature and cross-temporal representation learning to capture spatial-temporal dependencies. Evaluated on six real incomplete spatial-temporal datasets, GSLI showcases the improvement in data imputation.

Graph Structure Learning for Spatial-Temporal Imputation: Adapting to Node and Feature Scales

TL;DR

This work tackles missing data in spatial-temporal sensors by introducing GSLI, a multi-scale graph structure learning framework. It jointly learns node-scale graphs per feature and a feature-scale graph across features, incorporating prominence modeling to weight influential nodes/features. Through cross-feature and cross-temporal representations, GSLI captures rich spatial-temporal dependencies and demonstrates consistent improvements over diverse real datasets across multiple missing-data scenarios. The approach yields robust imputations and shows favorable downstream forecasting performance, with comprehensive analysis of complexity, ablations, and resource use. Overall, GSLI offers a principled, adaptable mechanism to handle feature heterogeneity and varying spatial relationships in spatial-temporal imputation tasks.

Abstract

Spatial-temporal data collected across different geographic locations often suffer from missing values, posing challenges to data analysis. Existing methods primarily leverage fixed spatial graphs to impute missing values, which implicitly assume that the spatial relationship is roughly the same for all features across different locations. However, they may overlook the different spatial relationships of diverse features recorded by sensors in different locations. To address this, we introduce the multi-scale Graph Structure Learning framework for spatial-temporal Imputation (GSLI) that dynamically adapts to the heterogeneous spatial correlations. Our framework encompasses node-scale graph structure learning to cater to the distinct global spatial correlations of different features, and feature-scale graph structure learning to unveil common spatial correlation across features within all stations. Integrated with prominence modeling, our framework emphasizes nodes and features with greater significance in the imputation process. Furthermore, GSLI incorporates cross-feature and cross-temporal representation learning to capture spatial-temporal dependencies. Evaluated on six real incomplete spatial-temporal datasets, GSLI showcases the improvement in data imputation.

Paper Structure

This paper contains 38 sections, 2 theorems, 25 equations, 9 figures, 10 tables.

Key Result

Proposition 1

The result of $\dot{\mathbf{A}}^{\mathbf{\Omega}} \mathbf{R}$ in the first term of the canonical graph diffusion convolution of the channel $c$ for $f_2$ feature at timestamp $t$ for the node $i$ is which is in conflict with the expected result $(a^{\mathbf{\Omega}}_{i1} r_{1,{f_2},c}+\dots + y r_{j,{f_2},c} + \dots + a^{\mathbf{\Omega}}_{iN} r_{N,{f_2},c})$, where $\dot{\mathbf{A}}^{\mathbf{\Ome

Figures (9)

  • Figure 1: (a) Incomplete spatial-temporal data with four features recorded in different stations in the Netherlands. (b) Imputation examples at timestamps $\mathit{t}_{3}$ and $\mathit{t}_{17}$. (c) The extracted attention maps for features DD and FH. (d) The extracted attention maps for the four features in stations ELD and ELL.
  • Figure 2: The overview of multi-scale Graph Structure Learning framework for spatial-temporal Imputation (GSLI). GSLI incorporates node-scale spatial learning, which can adapt to feature heterogeneity, and feature-scale spatial learning, which can exploit correlations between features. With cross-feature representation learning and cross-temporal representation learning, GSLI can effectively capture spatio-temporal dependencies for imputation.
  • Figure 3: Varying the missing mechanism over DutchWind dataset with 10% missing values
  • Figure 4: Average attention scores of different stations from cross-feature self-attention mechanism
  • Figure 6: Varying the missing mechanism over BeijingMEO dataset with 10% missing values
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2