Table of Contents
Fetching ...

Hybrid Ensemble Deep Graph Temporal Clustering for Spatiotemporal Data

Francis Ndikum Nji, Omar Faruque, Mostafa Cham, Janeja Vandana, Jianwu Wang

TL;DR

A novel Hybrid Ensemble Deep Graph Temporal Clustering (HEDGTC) algorithm that integrates homogeneous and heterogeneous ensemble clustering models, leveraging both traditional and deep learning-based clustering approaches is proposed.

Abstract

Classifying subsets based on spatial and temporal features is crucial to the analysis of spatiotemporal data given the inherent spatial and temporal variability. Since no single clustering algorithm ensures optimal results, researchers have increasingly explored the effectiveness of ensemble approaches. Ensemble clustering has attracted much attention due to increased diversity, better generalization, and overall improved clustering performance. While ensemble clustering may yield promising results on simple datasets, it has not been fully explored on complex multivariate spatiotemporal data. For our contribution to this field, we propose a novel hybrid ensemble deep graph temporal clustering (HEDGTC) method for multivariate spatiotemporal data. HEDGTC integrates homogeneous and heterogeneous ensemble methods and adopts a dual consensus approach to address noise and misclassification from traditional clustering. It further applies a graph attention autoencoder network to improve clustering performance and stability. When evaluated on three real-world multivariate spatiotemporal data, HEDGTC outperforms state-of-the-art ensemble clustering models by showing improved performance and stability with consistent results. This indicates that HEDGTC can effectively capture implicit temporal patterns in complex spatiotemporal data.

Hybrid Ensemble Deep Graph Temporal Clustering for Spatiotemporal Data

TL;DR

A novel Hybrid Ensemble Deep Graph Temporal Clustering (HEDGTC) algorithm that integrates homogeneous and heterogeneous ensemble clustering models, leveraging both traditional and deep learning-based clustering approaches is proposed.

Abstract

Classifying subsets based on spatial and temporal features is crucial to the analysis of spatiotemporal data given the inherent spatial and temporal variability. Since no single clustering algorithm ensures optimal results, researchers have increasingly explored the effectiveness of ensemble approaches. Ensemble clustering has attracted much attention due to increased diversity, better generalization, and overall improved clustering performance. While ensemble clustering may yield promising results on simple datasets, it has not been fully explored on complex multivariate spatiotemporal data. For our contribution to this field, we propose a novel hybrid ensemble deep graph temporal clustering (HEDGTC) method for multivariate spatiotemporal data. HEDGTC integrates homogeneous and heterogeneous ensemble methods and adopts a dual consensus approach to address noise and misclassification from traditional clustering. It further applies a graph attention autoencoder network to improve clustering performance and stability. When evaluated on three real-world multivariate spatiotemporal data, HEDGTC outperforms state-of-the-art ensemble clustering models by showing improved performance and stability with consistent results. This indicates that HEDGTC can effectively capture implicit temporal patterns in complex spatiotemporal data.
Paper Structure (20 sections, 11 equations, 4 figures, 4 tables)

This paper contains 20 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Architecture of our proposed Hybrid Ensemble Deep Graph Temporal Clustering (HEDGTC) model; This is an end-to-end architectural flow diagram representing various phases of our proposed model. The process starts with the data preparation phase were we data is injected and preprocessed. Homogeneous Ensemble Clustering represents the second phase and individually executes all base clustering models in a homogeneous fashion. Heterogeneous Ensemble Clustering consolidates the clustering results from the previous phase through co-occurrence consensus and the non-negative matrix factorization. Further merging of the resulting matrices is done to yield one combined matrix. The Final Clustering phase applies a graph attention auto encoder on the combined matrix providing our final partitions.
  • Figure 2: Matrix Concatenation: The co-association matrix has dimension [$a \times a$] were $a$ is the length of the time series, and the non-negative factorized matrix has dimension [$a \times r$] were $r$ is the rank. $Q$ is padded and added to the co=association matrix and the resulting matrix is of dimension [$a \times a$].
  • Figure 3: Graph Attention AutoEncoder for clustering the final merged matrix: The input of the encoder is an adjacency matrix $A$ and node features $X$ and the output is the reconstructed $\hat{\mathbf{A}}$ and $\hat{\mathbf{X}}$. Input data is projected to a lower dimensional dense layer $Z$ through stacked GATv2 and LSTM layers and KMeans is applied to the extracted features to generate our final clusterings.
  • Figure 4: OTA Stability Assessment on ERA5 on 20 executions. HEDGTC (green line) is seen to outperform all other baseline ensemble models with the lowest OTA. The higher the OTA measurement score, the less stable is the algorithm. The Ensemble Spectral Clustering model displays a wave-like stability structure as a results of the initial seed problem, noise introduced during perturbation and dependence on graph construction. Parea is least stable partly due to the initial seeds problem and the absence of consensus matrix post-processing.