Table of Contents
Fetching ...

Deep Dict: Deep Learning-based Lossy Time Series Compressor for IoT Data

Jinxin Liu, Petar Djukic, Michel Kulhandjian, Burak Kantarci

TL;DR

Deep Dict tackles lossy time-series compression for IoT data by learning Bernoulli latent representations through a Bernoulli transformer autoencoder (BTAE) and enforcing a distortion bound via uniform residual quantization. It introduces quantized entropy loss (QEL) to minimize the entropy of the quantized residual and thus the encoded size, while employing an entropy coder to achieve efficient transmission. The architecture leverages a transformer-based decoder with relative positional encoding and supports transfer learning to accelerate deployment. Across ten datasets, Deep Dict achieves up to 53.66% improvement over state-of-the-art compressors, with pronounced gains on longer time series, indicating strong practical potential for reducing IoT data bandwidth and storage requirements.

Abstract

We propose Deep Dict, a deep learning-based lossy time series compressor designed to achieve a high compression ratio while maintaining decompression error within a predefined range. Deep Dict incorporates two essential components: the Bernoulli transformer autoencoder (BTAE) and a distortion constraint. BTAE extracts Bernoulli representations from time series data, reducing the size of the representations compared to conventional autoencoders. The distortion constraint limits the prediction error of BTAE to the desired range. Moreover, in order to address the limitations of common regression losses such as L1/L2, we introduce a novel loss function called quantized entropy loss (QEL). QEL takes into account the specific characteristics of the problem, enhancing robustness to outliers and alleviating optimization challenges. Our evaluation of Deep Dict across ten diverse time series datasets from various domains reveals that Deep Dict outperforms state-of-the-art lossy compressors in terms of compression ratio by a significant margin by up to 53.66%.

Deep Dict: Deep Learning-based Lossy Time Series Compressor for IoT Data

TL;DR

Deep Dict tackles lossy time-series compression for IoT data by learning Bernoulli latent representations through a Bernoulli transformer autoencoder (BTAE) and enforcing a distortion bound via uniform residual quantization. It introduces quantized entropy loss (QEL) to minimize the entropy of the quantized residual and thus the encoded size, while employing an entropy coder to achieve efficient transmission. The architecture leverages a transformer-based decoder with relative positional encoding and supports transfer learning to accelerate deployment. Across ten datasets, Deep Dict achieves up to 53.66% improvement over state-of-the-art compressors, with pronounced gains on longer time series, indicating strong practical potential for reducing IoT data bandwidth and storage requirements.

Abstract

We propose Deep Dict, a deep learning-based lossy time series compressor designed to achieve a high compression ratio while maintaining decompression error within a predefined range. Deep Dict incorporates two essential components: the Bernoulli transformer autoencoder (BTAE) and a distortion constraint. BTAE extracts Bernoulli representations from time series data, reducing the size of the representations compared to conventional autoencoders. The distortion constraint limits the prediction error of BTAE to the desired range. Moreover, in order to address the limitations of common regression losses such as L1/L2, we introduce a novel loss function called quantized entropy loss (QEL). QEL takes into account the specific characteristics of the problem, enhancing robustness to outliers and alleviating optimization challenges. Our evaluation of Deep Dict across ten diverse time series datasets from various domains reveals that Deep Dict outperforms state-of-the-art lossy compressors in terms of compression ratio by a significant margin by up to 53.66%.
Paper Structure (15 sections, 4 equations, 13 figures, 3 tables)

This paper contains 15 sections, 4 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Overview of Deep Dict.
  • Figure 2: Detailed Architecture of the Decoder of Deep Dict.
  • Figure 3: Detailed Architecture of Multihead Attention with RPE.
  • Figure 4: Intuitive Example of Uniformed Quantization.
  • Figure 5: Comparison of L1, L2/MSE, and QEL under bar_crawl univariate dataset.
  • ...and 8 more figures