SEAnet: A Deep Learning Architecture for Data Series Similarity Search

Qitong Wang; Themis Palpanas

SEAnet: A Deep Learning Architecture for Data Series Similarity Search

Qitong Wang, Themis Palpanas

TL;DR

This work proposes Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks, and describes SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design.

Abstract

A key operation for massive data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. We further enhance SEAnet with SEAtrans encoder. Finally, we propose novel sampling strategies, SEAsam and SEAsamE, that allow SEAnet to effectively train on massive datasets. Comprehensive experiments on 7 diverse synthetic and real datasets verify the advantages of DEA learned using SEAnet in providing high-quality data series summarizations and similarity search results.

SEAnet: A Deep Learning Architecture for Data Series Similarity Search

TL;DR

Abstract

Paper Structure (16 sections, 2 theorems, 10 equations, 19 figures, 3 tables, 2 algorithms)

This paper contains 16 sections, 2 theorems, 10 equations, 19 figures, 3 tables, 2 algorithms.

Introduction
Related Work
Background
DEA-based Similarity Search
SEAnet Architecture
Sum of Squares Preservation
Sampling with SEAsam and SEAsamE
Experimental Evaluation
SoS Preservation and SEAsam
DEA Quality
DEA for Approximate Search
DEA for Downstream applications
Time and Convergence
SEAsamE
SEAtrans
...and 1 more sections

Key Result

Lemma 1

Given a z-normalized series dataset $\mathcal{S}$ of size $n$ and its DEAs $\mathcal{E}$, $\mathcal{E}'$ is derived by z-normalizing and then multiplying $\mathcal{E}$ by $\frac{\sqrt{m}}{\sqrt{l}}$. $\mathcal{E}'$'s SoS is the same to $\mathcal{S}$, that is where $\overline{e^i}$ and $\sigma_{e^i}$ are the mean and standard deviation of DEA $e^i$ (without loss of generality, we assume $\sigma_{e

Figures (19)

Figure 1: Case studies where PAA and DFT work or fail to approximate and reconstruct series from RandWalk and Deep1B datasets. In both cases, DEA works to approximate and reconstruct series. All summarizations use the same memory budget.
Figure 2: Replace PAA by DEA for SAX symbolization.
Figure 3: Workflow of DEA-based approximate similarity search.
Figure 4: The SEAnet architecture and the details of a dilated full-preactivation ResBlock.
Figure 5: The SEAtrans encoder architecture.
...and 14 more figures

Theorems & Definitions (4)

Lemma 1
proof
Lemma 2
proof

SEAnet: A Deep Learning Architecture for Data Series Similarity Search

TL;DR

Abstract

SEAnet: A Deep Learning Architecture for Data Series Similarity Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (4)