Table of Contents
Fetching ...

Quantitative Evaluation of Motif Sets in Time Series

Daan Van Wesenbeeck, Aras Yurtman, Wannes Meert, Hendrik Blockeel

TL;DR

This work tackles the lack of broadly applicable quantitative evaluation in time series motif discovery by introducing PROM, a precision–recall metric under optimal matching, and TSMD-Bench, a benchmark built from real data-derived GT motif sets. PROM matches discovered motifs to ground-truth motifs via overlap-based criteria, optimizes motif-set alignment with the Hungarian method, and reports micro-averaged precision, recall, and F1, avoiding restrictive assumptions about motif length or the number of motif sets. TSMD-Bench constructs 14 benchmark datasets from classification archives, using a principled concatenation scheme and ARI-guided dataset selection to yield realistic TSMD tasks; it also provides validation/test splits to support hyperparameter tuning. Experiments show PROM offers a more balanced evaluation than existing metrics, LoCoMotif often achieves the best F1, and random-walk–based benchmarks are too easy, highlighting the value of the proposed benchmark for fair, large-scale TSMD comparisons.

Abstract

Time Series Motif Discovery (TSMD), which aims at finding recurring patterns in time series, is an important task in numerous application domains, and many methods for this task exist. These methods are usually evaluated qualitatively. A few metrics for quantitative evaluation, where discovered motifs are compared to some ground truth, have been proposed, but they typically make implicit assumptions that limit their applicability. This paper introduces PROM, a broadly applicable metric that overcomes those limitations, and TSMD-Bench, a benchmark for quantitative evaluation of time series motif discovery. Experiments with PROM and TSMD-Bench show that PROM provides a more comprehensive evaluation than existing metrics, that TSMD-Bench is a more challenging benchmark than earlier ones, and that the combination can help understand the relative performance of TSMD methods. More generally, the proposed approach enables large-scale, systematic performance comparisons in this field.

Quantitative Evaluation of Motif Sets in Time Series

TL;DR

This work tackles the lack of broadly applicable quantitative evaluation in time series motif discovery by introducing PROM, a precision–recall metric under optimal matching, and TSMD-Bench, a benchmark built from real data-derived GT motif sets. PROM matches discovered motifs to ground-truth motifs via overlap-based criteria, optimizes motif-set alignment with the Hungarian method, and reports micro-averaged precision, recall, and F1, avoiding restrictive assumptions about motif length or the number of motif sets. TSMD-Bench constructs 14 benchmark datasets from classification archives, using a principled concatenation scheme and ARI-guided dataset selection to yield realistic TSMD tasks; it also provides validation/test splits to support hyperparameter tuning. Experiments show PROM offers a more balanced evaluation than existing metrics, LoCoMotif often achieves the best F1, and random-walk–based benchmarks are too easy, highlighting the value of the proposed benchmark for fair, large-scale TSMD comparisons.

Abstract

Time Series Motif Discovery (TSMD), which aims at finding recurring patterns in time series, is an important task in numerous application domains, and many methods for this task exist. These methods are usually evaluated qualitatively. A few metrics for quantitative evaluation, where discovered motifs are compared to some ground truth, have been proposed, but they typically make implicit assumptions that limit their applicability. This paper introduces PROM, a broadly applicable metric that overcomes those limitations, and TSMD-Bench, a benchmark for quantitative evaluation of time series motif discovery. Experiments with PROM and TSMD-Bench show that PROM provides a more comprehensive evaluation than existing metrics, that TSMD-Bench is a more challenging benchmark than earlier ones, and that the combination can help understand the relative performance of TSMD methods. More generally, the proposed approach enables large-scale, systematic performance comparisons in this field.

Paper Structure

This paper contains 25 sections, 1 theorem, 9 equations, 14 figures, 3 tables.

Key Result

Lemma 1

Each discovered segment is matchable with at most one ground-truth segment.

Figures (14)

  • Figure 1: The ground-truth (GT) and discovered (D) motif sets in an example time series that represents a sequence of characters written by a digital pen misc_character_trajectories_175, where the goal is to retrieve the occurrences of each character that is repeated. A quantitative evaluation scores how well the discovered motif sets correspond to the GT motif sets. In this example, the orange discovered motif set is incomplete with respect to the orange GT motif set, the blue discovered motif set has excess motifs, and the green one is redundant given the blue one.
  • Figure 2: The evaluation process of PROM
  • Figure 3: The matching matrix $\mathbf{M}^{*}$ is obtained by permuting the columns of $\mathbf{M}$ with the optimal permutation $\pi$, and adding an extra column and row for unmatched GT and discovered motifs, respectively. In this specific example, $\pi(1) = 1, \pi(2) = 3$, and $\pi(3)=2$.
  • Figure 4: An example matching matrix $\mathbf{M}^{*}$ when $d>g$, and $\text{TP}_{i}$, $\text{FN}_{i}$, and $\text{FP}_{i}$ for $i=1$ (Left) and $i=2$ (Center). Right: The total number of $\text{TP}$, $\text{FN}$, and $\text{FP}$.
  • Figure 5: An example matching matrix $\mathbf{M}^{*}$ when $g>d$; and the definitions of $\text{TP}$, $\text{FN}$, and $\text{FP}$.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof