Table of Contents
Fetching ...

Clustering Time Series Data with Gaussian Mixture Embeddings in a Graph Autoencoder Framework

Amirabbas Afzali, Hesam Hosseini, Mohmmadamin Mirzai, Arash Amini

TL;DR

By uncovering community structures in stock markets, the Variational Mixture Graph Autoencoder provides deeper insights into stock relationships, benefiting market prediction, portfolio optimization, and risk management.

Abstract

Time series data analysis is prevalent across various domains, including finance, healthcare, and environmental monitoring. Traditional time series clustering methods often struggle to capture the complex temporal dependencies inherent in such data. In this paper, we propose the Variational Mixture Graph Autoencoder (VMGAE), a graph-based approach for time series clustering that leverages the structural advantages of graphs to capture enriched data relationships and produces Gaussian mixture embeddings for improved separability. Comparisons with baseline methods are included with experimental results, demonstrating that our method significantly outperforms state-of-the-art time-series clustering techniques. We further validate our method on real-world financial data, highlighting its practical applications in finance. By uncovering community structures in stock markets, our method provides deeper insights into stock relationships, benefiting market prediction, portfolio optimization, and risk management.

Clustering Time Series Data with Gaussian Mixture Embeddings in a Graph Autoencoder Framework

TL;DR

By uncovering community structures in stock markets, the Variational Mixture Graph Autoencoder provides deeper insights into stock relationships, benefiting market prediction, portfolio optimization, and risk management.

Abstract

Time series data analysis is prevalent across various domains, including finance, healthcare, and environmental monitoring. Traditional time series clustering methods often struggle to capture the complex temporal dependencies inherent in such data. In this paper, we propose the Variational Mixture Graph Autoencoder (VMGAE), a graph-based approach for time series clustering that leverages the structural advantages of graphs to capture enriched data relationships and produces Gaussian mixture embeddings for improved separability. Comparisons with baseline methods are included with experimental results, demonstrating that our method significantly outperforms state-of-the-art time-series clustering techniques. We further validate our method on real-world financial data, highlighting its practical applications in finance. By uncovering community structures in stock markets, our method provides deeper insights into stock relationships, benefiting market prediction, portfolio optimization, and risk management.

Paper Structure

This paper contains 24 sections, 41 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: The general architecture of the Variational Mixture Graph Autoencoder (VMGAE). The dataset $\mathcal{D}$ consists of multiple time series data, and Weighted Dynamic Time Warping (WDTW) is used to compute distances that form the adjacency matrix $\boldsymbol{A}$, representing connections in the graph $\boldsymbol{G}$. The mean $\boldsymbol{\mu}$ and log standard deviation $\log \boldsymbol{\sigma}$ are computed for the variational latent space, creating node embeddings $\boldsymbol{Z}$. These embeddings undergo transformation to reconstruct the adjacency matrix $\hat{\boldsymbol{A}}$, with the reconstruction loss $\mathcal{L}_{\text{recon}}$ enforcing fidelity to $\boldsymbol{A}$. The regularization loss $\mathcal{L}_{\text{reg}}$ applies to the mixture model parameters $\tilde{\boldsymbol{\sigma}}, \tilde{\boldsymbol{\mu}}, \boldsymbol{\pi}$, enhancing the latent space structure.
  • Figure 2: Graph visualizations of the Symbols dataset, illustrating effective data separation. Different colors correspond to distinct labels.
  • Figure 3: (a) Clustering results of the normalized closing prices for the top 50 U.S. stocks, grouped into five clusters. (b) The average normalized closing price for each cluster shows distinct patterns across the clusters.
  • Figure 4: The visualizations with t-SNE on the dataset DiatomSizeReduction. The colors of the points indicate the actual labels. (a) epoch 0, (b) epoch 10, (c) epoch 100.