Table of Contents
Fetching ...

Dynamic Multi-Network Mining of Tensor Time Series

Kohei Obata, Koki Kawabata, Yasuko Matsubara, Yasushi Sakurai

TL;DR

This work tackles subsequence clustering of tensor time series by proposing Dynamic Multi-network Mining (DMM), which represents each cluster with sparse, mode-specific dependency networks learned via a multimode graphical lasso and selected through an MDL-based cost that jointly optimizes segmentation, clustering, and model sparsity. The method scales linearly with data size and uses CutPointDetector and ClusterDetector to efficiently determine segment boundaries and cluster assignments. Experiments on synthetic and real-world data show improved clustering accuracy and highly interpretable networks, compared to state-of-the-art baselines such as TAGM and TICC. The approach provides actionable insights by revealing phase-specific relationships among variables across non-temporal modes, making it well-suited for exploratory analysis of complex tensor time series.

Abstract

Subsequence clustering of time series is an essential task in data mining, and interpreting the resulting clusters is also crucial since we generally do not have prior knowledge of the data. Thus, given a large collection of tensor time series consisting of multiple modes, including timestamps, how can we achieve subsequence clustering for tensor time series and provide interpretable insights? In this paper, we propose a new method, Dynamic Multi-network Mining (DMM), that converts a tensor time series into a set of segment groups of various lengths (i.e., clusters) characterized by a dependency network constrained with l1-norm. Our method has the following properties. (a) Interpretable: it characterizes the cluster with multiple networks, each of which is a sparse dependency network of a corresponding non-temporal mode, and thus provides visible and interpretable insights into the key relationships. (b) Accurate: it discovers the clusters with distinct networks from tensor time series according to the minimum description length (MDL). (c) Scalable: it scales linearly in terms of the input data size when solving a non-convex problem to optimize the number of segments and clusters, and thus it is applicable to long-range and high-dimensional tensors. Extensive experiments with synthetic datasets confirm that our method outperforms the state-of-the-art methods in terms of clustering accuracy. We then use real datasets to demonstrate that DMM is useful for providing interpretable insights from tensor time series.

Dynamic Multi-Network Mining of Tensor Time Series

TL;DR

This work tackles subsequence clustering of tensor time series by proposing Dynamic Multi-network Mining (DMM), which represents each cluster with sparse, mode-specific dependency networks learned via a multimode graphical lasso and selected through an MDL-based cost that jointly optimizes segmentation, clustering, and model sparsity. The method scales linearly with data size and uses CutPointDetector and ClusterDetector to efficiently determine segment boundaries and cluster assignments. Experiments on synthetic and real-world data show improved clustering accuracy and highly interpretable networks, compared to state-of-the-art baselines such as TAGM and TICC. The approach provides actionable insights by revealing phase-specific relationships among variables across non-temporal modes, making it well-suited for exploratory analysis of complex tensor time series.

Abstract

Subsequence clustering of time series is an essential task in data mining, and interpreting the resulting clusters is also crucial since we generally do not have prior knowledge of the data. Thus, given a large collection of tensor time series consisting of multiple modes, including timestamps, how can we achieve subsequence clustering for tensor time series and provide interpretable insights? In this paper, we propose a new method, Dynamic Multi-network Mining (DMM), that converts a tensor time series into a set of segment groups of various lengths (i.e., clusters) characterized by a dependency network constrained with l1-norm. Our method has the following properties. (a) Interpretable: it characterizes the cluster with multiple networks, each of which is a sparse dependency network of a corresponding non-temporal mode, and thus provides visible and interpretable insights into the key relationships. (b) Accurate: it discovers the clusters with distinct networks from tensor time series according to the minimum description length (MDL). (c) Scalable: it scales linearly in terms of the input data size when solving a non-convex problem to optimize the number of segments and clusters, and thus it is applicable to long-range and high-dimensional tensors. Extensive experiments with synthetic datasets confirm that our method outperforms the state-of-the-art methods in terms of clustering accuracy. We then use real datasets to demonstrate that DMM is useful for providing interpretable insights from tensor time series.
Paper Structure (39 sections, 1 theorem, 9 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 39 sections, 1 theorem, 9 equations, 9 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

The time complexity of DMM is $O( T \prod_{m=1}^N D_{m})$, where $T$ is the data length, and $D_{m}$ is the number of variables at mode-m in $($N+1$)^{th}$-order TTS $\mathcal{X} \in \mathbb{R}^{D_{1} \times \cdots \times D_{N} \times T}$.

Figures (9)

  • Figure 1: Effectiveness of DMM on Google Trends (#4 Covid) dataset: (a) DMM can split the tensor time series into meaningful subsequence clusters shown by colors (i.e., #green$\rightarrow$ "Before Covid", #pink$\rightarrow$ "Outbreak", #gray$\rightarrow$ "Vaccine", #blue$\rightarrow$ "Adaptation"), and (b) their important relationships between variables are summarized with country and query networks, where the nodes show individual variables, and the thickness and color of the edges are partial correlations showing the importance of its interaction.
  • Figure 2: Illustration of the three candidates. We compare the total description cost of each of these candidates.
  • Figure 3: DMM outperforms the state-of-the-art methods: Clustering accuracy for synthetic data, macro-F1 score vs. data size, i.e., (a) $2^{nd}$-order TTS $(D_{1}, T) = (5 \sim 50, 800)$, (b) $3^{rd}$-order TTS $(D_{1}, D_{2}, T) = (5 \sim 50, 5, 800)$.
  • Figure 4: DMM scales linearly: Computation time vs. data size, i.e., we vary (a) $D_{1}$ ($D_{1}=5 \sim 50, D_{2}=5, T=800$) and (b) $T$ ($D_{1}=5, D_{2}=5, T=800 \sim 80000$).
  • Figure 5: Computation time of DMM: our method surpasses its baselines. It is up to $300\times$ faster than TICC.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1: Reorder
  • Lemma 1