Table of Contents
Fetching ...

MOMENTI: Scalable Motif Mining in Multidimensional Time Series

Matteo Ceccarello, Francesco Pio Monaco, Francesco Silvestri

Abstract

Time series play a fundamental role in many domains, capturing a plethora of information about the underlying data-generating processes. When a process generates multiple synchronized signals we are faced with multidimensional time series. In this context a fundamental problem is that of motif mining, where we seek patterns repeating twice with minor variations, spanning some of the dimensions. State of the art exact solutions for this problem run in time quadratic in the length of the input time series. We provide a scalable method to find the top-k motifs in multidimensional time series with probabilistic guarantees on the quality of the results. Our algorithm runs in time subquadratic in the length of the input, and returns the exact solution with probability at least $1-δ$, where $δ$ is a user-defined parameter. The algorithm is designed to be adaptive to the input distribution, self-tuning its parameters while respecting user-defined limits on the memory to use. Our theoretical analysis is complemented by an extensive experimental evaluation, showing that our algorithm is orders of magnitude faster than the state of the art.

MOMENTI: Scalable Motif Mining in Multidimensional Time Series

Abstract

Time series play a fundamental role in many domains, capturing a plethora of information about the underlying data-generating processes. When a process generates multiple synchronized signals we are faced with multidimensional time series. In this context a fundamental problem is that of motif mining, where we seek patterns repeating twice with minor variations, spanning some of the dimensions. State of the art exact solutions for this problem run in time quadratic in the length of the input time series. We provide a scalable method to find the top-k motifs in multidimensional time series with probabilistic guarantees on the quality of the results. Our algorithm runs in time subquadratic in the length of the input, and returns the exact solution with probability at least , where is a user-defined parameter. The algorithm is designed to be adaptive to the input distribution, self-tuning its parameters while respecting user-defined limits on the memory to use. Our theoretical analysis is complemented by an extensive experimental evaluation, showing that our algorithm is orders of magnitude faster than the state of the art.

Paper Structure

This paper contains 29 sections, 9 theorems, 18 equations, 7 figures, 6 tables, 2 algorithms.

Key Result

lemma 1

Given a pair of subsequences $\mathbf{T}_{a}, \mathbf{T}_{b}$ and parameter $d_m$, consider iteration $i$ of the outer loop of alg:emitaggr. Then, we have $W(a, b) \ge d$ with probability at least $P\left(\operatorname{dist}^{\max}_{d}\left(\mathbf{T}_{a}, \mathbf{T}_{b}\right)\right)^{i \cdot d}$ w

Figures (7)

  • Figure 1: Multidimensional time series from an industrial evaporator DaISyEVAP. The top-3 two dimensional motifs are highlighted.
  • Figure 2: Solid lines mark the time required by MOMENTI to find the top motif of each dimensionality for all datasets; dashed lines mark the time required by the Mstump baseline for the same task.
  • Figure 3: Scalability vs. input size (log scale).
  • Figure 4: Time and space requirements for motif discovery at different maximum allowed values of $K$.
  • Figure 5: Time and space requirements for motif discovery at different maximum allowed values of $L$.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Definition 3.5
  • Definition 3.6
  • Definition 3.7
  • Definition 3.8
  • Definition 3.9
  • Definition 3.10
  • ...and 11 more