Deep Dict: Deep Learning-based Lossy Time Series Compressor for IoT Data
Jinxin Liu, Petar Djukic, Michel Kulhandjian, Burak Kantarci
TL;DR
Deep Dict tackles lossy time-series compression for IoT data by learning Bernoulli latent representations through a Bernoulli transformer autoencoder (BTAE) and enforcing a distortion bound via uniform residual quantization. It introduces quantized entropy loss (QEL) to minimize the entropy of the quantized residual and thus the encoded size, while employing an entropy coder to achieve efficient transmission. The architecture leverages a transformer-based decoder with relative positional encoding and supports transfer learning to accelerate deployment. Across ten datasets, Deep Dict achieves up to 53.66% improvement over state-of-the-art compressors, with pronounced gains on longer time series, indicating strong practical potential for reducing IoT data bandwidth and storage requirements.
Abstract
We propose Deep Dict, a deep learning-based lossy time series compressor designed to achieve a high compression ratio while maintaining decompression error within a predefined range. Deep Dict incorporates two essential components: the Bernoulli transformer autoencoder (BTAE) and a distortion constraint. BTAE extracts Bernoulli representations from time series data, reducing the size of the representations compared to conventional autoencoders. The distortion constraint limits the prediction error of BTAE to the desired range. Moreover, in order to address the limitations of common regression losses such as L1/L2, we introduce a novel loss function called quantized entropy loss (QEL). QEL takes into account the specific characteristics of the problem, enhancing robustness to outliers and alleviating optimization challenges. Our evaluation of Deep Dict across ten diverse time series datasets from various domains reveals that Deep Dict outperforms state-of-the-art lossy compressors in terms of compression ratio by a significant margin by up to 53.66%.
