Table of Contents
Fetching ...

DASPack: Controlled Data Compression for Distributed Acoustic Sensing

Aleix Segui, Arantza Ugalde, Andreas Fichtner, Sergi Ventosa, Josep Ramon Morros

TL;DR

DASPack tackles the DAS data deluge by delivering a fixed-accuracy, fully reversible compression pipeline tailored for DAS signals. It combines a controlled degradation step with a lossless compression pipeline based on 2-D discrete wavelet transforms, linear predictive coding, and entropy coding, ensuring deterministic error bounds for a user-defined step $\Delta$. On 15 real-world datasets, it achieves up to about 3x lossless compression and up to about 10x with acceptable degradation, with real-time CPU throughput around 100–200 MB/s per thread and up to 750 MB/s on 8 threads. It consistently outperforms general-purpose compressors and several DAS-specific methods, enabling edge deployment and scalable long-term DAS data management in seismology, infrastructure monitoring, and environmental sensing.

Abstract

We present DASPack, a high-performance, open-source compression tool specifically designed for distributed acoustic sensing (DAS) data. As DAS becomes a key technology for real-time, high-density, and long-range monitoring in fields such as geophysics, infrastructure surveillance, and environmental sensing, the volume of collected data is rapidly increasing. Large-scale DAS deployments already generate hundreds of terabytes and are expected to increase in the coming years, making long-term storage a major challenge. Despite this urgent need, few compression methods have proven to be both practical and scalable in real-world scenarios. DASPack is a fully operational solution that consistently outperforms existing techniques for DAS data. It enables both controlled lossy and lossless compression by allowing users to choose the maximum absolute difference per datum between the original and compressed data. The compression pipeline combines wavelet transforms, linear predictive coding, and entropy coding to optimise efficiency. Our method achieves up to 3x file size reductions for strain and strain rate data in lossless mode across diverse datasets. In lossy mode, compression improves to 6x with near-perfect signal fidelity, and up to 10x is reached with acceptable signal degradation. It delivers fast throughput (100-200 MB/s using a single-thread and up to 750 MB/s using 8-threads), enabling real-time deployment even under high data rates. We validated its performance on 15 datasets from a variety of acquisition environments, demonstrating its speed, robustness, and broad applicability. DASPack provides a practical foundation for long-term, sustainable DAS data management in large-scale monitoring networks.

DASPack: Controlled Data Compression for Distributed Acoustic Sensing

TL;DR

DASPack tackles the DAS data deluge by delivering a fixed-accuracy, fully reversible compression pipeline tailored for DAS signals. It combines a controlled degradation step with a lossless compression pipeline based on 2-D discrete wavelet transforms, linear predictive coding, and entropy coding, ensuring deterministic error bounds for a user-defined step . On 15 real-world datasets, it achieves up to about 3x lossless compression and up to about 10x with acceptable degradation, with real-time CPU throughput around 100–200 MB/s per thread and up to 750 MB/s on 8 threads. It consistently outperforms general-purpose compressors and several DAS-specific methods, enabling edge deployment and scalable long-term DAS data management in seismology, infrastructure monitoring, and environmental sensing.

Abstract

We present DASPack, a high-performance, open-source compression tool specifically designed for distributed acoustic sensing (DAS) data. As DAS becomes a key technology for real-time, high-density, and long-range monitoring in fields such as geophysics, infrastructure surveillance, and environmental sensing, the volume of collected data is rapidly increasing. Large-scale DAS deployments already generate hundreds of terabytes and are expected to increase in the coming years, making long-term storage a major challenge. Despite this urgent need, few compression methods have proven to be both practical and scalable in real-world scenarios. DASPack is a fully operational solution that consistently outperforms existing techniques for DAS data. It enables both controlled lossy and lossless compression by allowing users to choose the maximum absolute difference per datum between the original and compressed data. The compression pipeline combines wavelet transforms, linear predictive coding, and entropy coding to optimise efficiency. Our method achieves up to 3x file size reductions for strain and strain rate data in lossless mode across diverse datasets. In lossy mode, compression improves to 6x with near-perfect signal fidelity, and up to 10x is reached with acceptable signal degradation. It delivers fast throughput (100-200 MB/s using a single-thread and up to 750 MB/s using 8-threads), enabling real-time deployment even under high data rates. We validated its performance on 15 datasets from a variety of acquisition environments, demonstrating its speed, robustness, and broad applicability. DASPack provides a practical foundation for long-term, sustainable DAS data management in large-scale monitoring networks.

Paper Structure

This paper contains 21 sections, 3 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Controlled degradation of the amplitude in nanostrain ($\text{n}\varepsilon$) at different quantisation steps. The data points are represented as coloured dots, and are joined via a black line. The continuous, original signal is represented in (a), (b) shows a quantisation step of 0.1$~\text{n}\varepsilon$ and (c) is 0.25 $~\text{n}\varepsilon$. The waveform is extracted from dataset 10 (Alboran Sea).
  • Figure 2: Compression and decompression pipelines. (a) Data is transformed to integers and optionally quantised via a user-defined quantisation step. (b) Data is first split into non-overlapping tiles that are processed separately and the temporal mean is removed per-channel. (c) A wavelet transform is optionally applied in both (time and space) dimensions. (d) A linear prediction filter is optionally applied with a fixed number of parameters in both dimensions in every wavelet subband. (e) Data is entropy coded to obtain a bit sequence. Step (a) is irreversible but tuneable; steps (b), (c), (d) and (e) are all reversible, ensuring lossless compression.
  • Figure 3: (i) Raw original and (ii) LPC-filtered data example, showing strain from the dataset 13 (Vinaroz), see Table \ref{['tab:datasets']}. Column (a) shows the strain values, (b) the power spectrum in the $f$-$k$ transform and (c) the 2-D autocorrelation function.
  • Figure 4: Example of wavelet-transformed and LPC-filtered strain data from dataset 13 (Vinaroz). For every subfigure, the four subbands obtained after the applying the wavelet transform in both dimensions are shown. Left plots correspond to frequency low-pass and right plots, to frequency high-pass. Top plots are wavenumber low-pass and bottom plots, wavenumber high-pass (a) Strain values. (b) $f$-$k$ power spectrum. (c) 2-D autocorrelation function.
  • Figure 5: Density histograms of raw original, and predicted (wavelet-transformed and LPC-filtered) data, for datasets 1, 2, 5 and 8 (see Table \ref{['tab:datasets']}). The variance is reduced when the models are able to predict the signals. The lower the variance, the more compressible the data source is. The lower variance in the predicted data reveals how the processing contributes to better compression.
  • ...and 8 more figures