Table of Contents
Fetching ...

DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend

Ajitesh Srivastava

TL;DR

DTW+S addresses the limitation of traditional time-series similarity measures by focusing on ordered local trends and their timing. It builds a shapelet-space representation (SSR) of time-series as a matrix of local-trend descriptors and computes distances by applying Dynamic Time Warping (DTW) to the SSR columns, including a flat-dimension to handle near-constant segments. The authors prove necessary and sufficient conditions for a neighborhood-preserving mapping using $w$ shapelets plus a flat shapelet, propose a DTW+S-based ensemble via barycenter averaging, and demonstrate improved clustering of epidemic curves and classification on datasets where local trends dominate. The approach offers interpretability, preserves event timing and magnitude, and shows practical benefits for epidemic forecasting and related domains, with smoothing providing additional gains on noisy data.

Abstract

Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification, clustering, and ensembling/alignment. Existing measures may fail to capture similarities among local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in several tasks. For the clustering of epidemic curves, we show that DTW+S is the only measure able to produce good clustering compared to the baselines. For ensemble building, we propose a combination of DTW+S and barycenter averaging that results in the best preservation of characteristics of the underlying trajectories. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.

DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend

TL;DR

DTW+S addresses the limitation of traditional time-series similarity measures by focusing on ordered local trends and their timing. It builds a shapelet-space representation (SSR) of time-series as a matrix of local-trend descriptors and computes distances by applying Dynamic Time Warping (DTW) to the SSR columns, including a flat-dimension to handle near-constant segments. The authors prove necessary and sufficient conditions for a neighborhood-preserving mapping using shapelets plus a flat shapelet, propose a DTW+S-based ensemble via barycenter averaging, and demonstrate improved clustering of epidemic curves and classification on datasets where local trends dominate. The approach offers interpretability, preserves event timing and magnitude, and shows practical benefits for epidemic forecasting and related domains, with smoothing providing additional gains on noisy data.

Abstract

Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification, clustering, and ensembling/alignment. Existing measures may fail to capture similarities among local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in several tasks. For the clustering of epidemic curves, we show that DTW+S is the only measure able to produce good clustering compared to the baselines. For ensemble building, we propose a combination of DTW+S and barycenter averaging that results in the best preservation of characteristics of the underlying trajectories. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.
Paper Structure (29 sections, 5 theorems, 12 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 5 theorems, 12 equations, 12 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Property prop:2 is satisfied with any set of $w-1$ linearly independent shapelets and the "flat" shapelet, i.e., with this choice $\|f(\mathbf{x}) - f(\mathbf{y})\| \leq \epsilon$ iff (i) both $\mathbf{x}$ and $\mathbf{y}$ are "almost" flat, or (ii) $\|\mathbf{x'} - \mathbf{y'}\| \leq \delta$, for s

Figures (12)

  • Figure 1: Simple measures like Mean Absolute Error can be deceiving. In all these examples, Model 1 seems to be closer to the Ground truth, but receives a higher distance compared to a straight line.
  • Figure 2: Failure of the mean ensemble in capturing the properties of individual time-series -- much lower peak.
  • Figure 3: Shapelet-space Representation of a time-series.
  • Figure 4: Applying mean, DTW, and DTW+S to develop ensemble of two time-series.
  • Figure 5: Ensembling results: (a) Different ensembling approaches on Set 7. The dark yellow lines represent the individual trajectories. (b) An instance of alignment on Set 1. Pink 'x' on the individual time-series are aligned to get the ensemble point (pink circle). Previous circles represent the ensemble points obtained from previous alignments
  • ...and 7 more figures

Theorems & Definitions (15)

  • Definition 1: Shapelet
  • Definition 2: Shapelet-space Representation
  • Definition 3: Trend Descriptor
  • Definition 4: Local Trend
  • Definition 5: Interpretable
  • Definition 6: Ordered Local Trend
  • Definition 7: Shapelet-space Representation - Time-series
  • Theorem 1
  • Theorem 2
  • Theorem 3: Expressive power
  • ...and 5 more