BOA Constrictor: A Mamba-based lossless compressor for High Energy Physics data

Akshat Gupta; Caterina Doglioni; Thomas Joseph Elliott

BOA Constrictor: A Mamba-based lossless compressor for High Energy Physics data

Akshat Gupta, Caterina Doglioni, Thomas Joseph Elliott

TL;DR

This work presents the Bytewise Online Autoregressive (BOA) Constrictor, a novel, streaming-capable lossless compressor built upon the Mamba architecture, and concludes that while this Mamba-based approach is a highly promising proof-of-principle, significant future work on performance optimisation and hardware portability is required to develop it into a production-ready tool for the HEP community.

Abstract

The petabyte-scale data generated annually by High Energy Physics (HEP) experiments like those at the Large Hadron Collider present a significant data storage challenge. Whilst traditional algorithms like LZMA and ZLIB are widely used, they often fail to exploit the deep structure inherent in scientific data. We investigate the application of modern state space models (SSMs) to this problem, which have shown promise for capturing long-range dependencies in sequences. We present the Bytewise Online Autoregressive (BOA) Constrictor, a novel, streaming-capable lossless compressor built upon the Mamba architecture. BOA combines an autoregressive Mamba model for next-byte prediction with a parallelised streaming range coder. We evaluate our method on three distinct structured datasets in HEP, demonstrating state-of-the-art compression ratios, improving upon LZMA-9 across all datasets. These improvements range from 2.21$\times$ (vs. 1.69$\times$) on the ATLAS dataset to a substantial 44.14$\times$ (vs. 27.14$\times$) on the highly-structured CMS dataset, with a modest $\sim 4.5$MB model size. However, this gain in compression ratio comes with a trade-off in throughput; the Storage-Saving Rate ($σ_{SSR}$) of our prototype currently lags behind highly-optimised CPU-based algorithms like ZLIB. We conclude that while this Mamba-based approach is a highly promising proof-of-principle, significant future work on performance optimisation and hardware portability is required to develop it into a production-ready tool for the HEP community.

BOA Constrictor: A Mamba-based lossless compressor for High Energy Physics data

TL;DR

Abstract

BOA Constrictor: A Mamba-based lossless compressor for High Energy Physics data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)