Table of Contents
Fetching ...

Lossless Compression of Time Series Data: A Comparative Study

Jonas G. Matt, Pengcheng Huang, Balz Maag

TL;DR

This work tackles the challenge of lossless time series compression at scale by proposing a two-stage framework that combines compression-aiding transformations with entropy coding. The authors perform the largest-to-date comparative study across synthetic and real-world datasets, using ablations to reveal how delta coding and QuaRs reshuffling affect compressibility, and how hybrid and time-series specialized methods fare in practice. Key findings show that no single method dominates all data, but holistic pipelines with delta coding and tailored transforms achieve the best performance, with Sprintz and Pcodec offering favorable speed–compression trade-offs. The results provide practical guidance for selecting and composing compression components to tailor pipelines to specific time series characteristics.

Abstract

Our increasingly digital and connected world has led to the generation of unprecedented amounts of data. This data must be efficiently managed, transmitted, and stored to preserve resources and allow scalability. Data compression has therein been a key technology for a long time, resulting in a vast landscape of available techniques. This largest-to-date study analyzes and compares various lossless data compression methods for time series data. We present a unified framework encompassing two stages: data transformation and entropy encoding. We evaluate compression algorithms across both synthetic and real-world datasets with varying characteristics. Through ablation studies at each compression stage, we isolate the impact of individual components on overall compression performance -- revealing the strengths and weaknesses of different algorithms when facing diverse time series properties. Our study underscores the importance of well-configured and complete compression pipelines beyond individual components or algorithms; it offers a comprehensive guide for selecting and composing the most appropriate compression algorithms tailored to specific datasets.

Lossless Compression of Time Series Data: A Comparative Study

TL;DR

This work tackles the challenge of lossless time series compression at scale by proposing a two-stage framework that combines compression-aiding transformations with entropy coding. The authors perform the largest-to-date comparative study across synthetic and real-world datasets, using ablations to reveal how delta coding and QuaRs reshuffling affect compressibility, and how hybrid and time-series specialized methods fare in practice. Key findings show that no single method dominates all data, but holistic pipelines with delta coding and tailored transforms achieve the best performance, with Sprintz and Pcodec offering favorable speed–compression trade-offs. The results provide practical guidance for selecting and composing compression components to tailor pipelines to specific time series characteristics.

Abstract

Our increasingly digital and connected world has led to the generation of unprecedented amounts of data. This data must be efficiently managed, transmitted, and stored to preserve resources and allow scalability. Data compression has therein been a key technology for a long time, resulting in a vast landscape of available techniques. This largest-to-date study analyzes and compares various lossless data compression methods for time series data. We present a unified framework encompassing two stages: data transformation and entropy encoding. We evaluate compression algorithms across both synthetic and real-world datasets with varying characteristics. Through ablation studies at each compression stage, we isolate the impact of individual components on overall compression performance -- revealing the strengths and weaknesses of different algorithms when facing diverse time series properties. Our study underscores the importance of well-configured and complete compression pipelines beyond individual components or algorithms; it offers a comprehensive guide for selecting and composing the most appropriate compression algorithms tailored to specific datasets.

Paper Structure

This paper contains 54 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the main compared methods categorized by the unified compression framework, detailing their transformation and entropy coding stages.
  • Figure 2: The synthetic test cases, before and after progressively applying the different transformations (delta coding, , ). This demonstrates the impact of each transformation stage on data representation.
  • Figure 3: Compression scores achieved by the different methods on the synthetic test cases. The markers along a line correspond to the progressive application of delta coding, , and . Compression scores below $0$ (compressed data is larger than the original data) are shown as $0$.
  • Figure 4: Compression scores and speeds achieved by the different methods on the real-world datasets. Markers with no outline indicate results using compression level settings different from the default ones stated in Section \ref{['subsec:experimental-setup']}. They indicate the speed-compression trade-off achievable by the methods Zstd, Brotli, and zlib.