SHRINK: Data Compression by Semantic Extraction and Residuals Encoding
Guoyou Sun, Panagiotis Karras, Qi Zhang
TL;DR
SHRINK addresses the challenge of ultra-accurate data compression for IoT edge data by introducing a semantic-aware, two-phase approach. It first extracts enduring data patterns as adaptive linear-segment semantics (shrinking cones) to build a compact knowledge base, then encodes the remaining detail as residuals to enable lossy and lossless reconstruction at multiple $L_{\infty}$ error levels. The core contributions are the adaptive base error threshold, knowledge-base merging of similar semantics, and residual-encoded bitstream with entropy coding, yielding high compression ratios and throughput. Experiments across diverse datasets show SHRINK outperforming state-of-the-art lossy and lossless compressors, with particularly strong performance at stringent accuracy levels and on larger data, making it well-suited for edge IoT storage and analytics.
Abstract
The distributed data infrastructure in Internet of Things (IoT) ecosystems requires efficient data-series compression methods, along with the ability to feed different accuracy demands. However, the compression performance of existing compression methods degrades sharply when calling for ultra-accurate data recovery. In this paper, we introduce SHRINK, a novel highly accurate data compression method that offers a higher compression ratio and also lower runtime than prior compressors. SHRINK extracts data semantics in the form of linear segments to construct a compact knowledge base, using a dynamic error threshold that it adapts to data characteristics. Then, it captures the remaining data details as residuals to support lossy compression at diverse resolutions as well as lossless compression. As SHRINK identifies repeated semantics, its compression ratio increases with data size. Our experimental evaluation demonstrates that SHRINK outperforms state-of-art methods with an up to threefold improvement in compression ratio.
