MS-Index: Fast Top-k Subsequence Search for Multivariate Time Series under Euclidean Distance

Jens E. d'Hondt; Teun Kortekaas; Odysseas Papapetrou; Themis Palpanas

MS-Index: Fast Top-k Subsequence Search for Multivariate Time Series under Euclidean Distance

Jens E. d'Hondt, Teun Kortekaas, Odysseas Papapetrou, Themis Palpanas

TL;DR

This work tackles exact k-nearest neighbor subsequence search on multivariate time series with ad-hoc channel selection. It introduces MS-Index, which combines per-channel DFT-based summarization with an R-tree index and the MASS convolution-based exact distance computation to prune the search space aggressively while guaranteeing correctness. The authors propose optimizations for tighter bounds and more efficient indexing, and demonstrate up to two orders of magnitude speedups over state-of-the-art baselines across 34 datasets, including long, high-channel-count series. The approach supports fixed-length subsequences and adversarial channel choices at query time, making it robust and practical for real-world multivariate sensor data analysis.

Abstract

Modern applications frequently collect and analyze temporal data in the form of multivariate time series (MTS) -- time series that contain multiple channels. A common task in this context is subsequence search, which involves identifying all MTS that contain subsequences highly similar to a query time series. In practical scenarios, not all channels of an MTS are relevant to every query. For instance, airplane sensors may gather data on a plethora of components and subsystems, but only a few of these are relevant to a specific query, such as identifying the cause of a malfunctioning landing gear, or a specific flight maneuver. Consequently, the relevant query channels are often specified at query time. In this work, we introduce the Multivariate Subsequence Index (MS-Index), a novel algorithm for nearest neighbor MTS subsequence search under Euclidean distance that supports ad-hoc selection of query channels. The algorithm is exact and demonstrates query performance that scales sublinearly to the number of query channels. We examine the properties of \name with a thorough experimental evaluation over 34 datasets, and show that it outperforms the state-of-the-art one to two orders of magnitude for both raw and normalized subsequences.

MS-Index: Fast Top-k Subsequence Search for Multivariate Time Series under Euclidean Distance

TL;DR

Abstract

MS-Index: Fast Top-k Subsequence Search for Multivariate Time Series under Euclidean Distance

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)