RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless Compression
Yuchao Qin, Anjunyi Fan, Bonan Yan
TL;DR
Lossless data-center compression demands high throughput with bit-exact recovery. RAS is a hardware–software codesign that fuses an rANS core with a neural probability generator, employing a mixed-precision probability path, a two-stage update, and a prediction-guided decoder to prune CDF searches while preserving exactness. Key contributions include a BF16-to-fixed-point probability interface with mass correction, a two-stage update enabling sustained throughput, and a decoder that uses prediction to reduce average search depth, all scalable via a multi-lane architecture. The approach demonstrates substantial RTL speedups (encode ≈121.2×, decode ≈70.9×) while maintaining competitive compression when paired with learned priors, offering a practical path to fast neural lossless compression in data-center workloads. The work generalizes to other ANS variants and highlights a viable route to integrate on-chip probability generation for energy- and latency-aware lossless coding.
Abstract
Data centers handle vast volumes of data that require efficient lossless compression, yet emerging probabilistic models based methods are often computationally slow. To address this, we introduce RAS, the Range Asymmetric Numeral System Acceleration System, a hardware architecture that integrates the rANS algorithm into a lossless compression pipeline and eliminates key bottlenecks. RAS couples an rANS core with a probabilistic generator, storing distributions in BF16 format and converting them once into a fixed-point domain shared by a unified division/modulo datapath. A two-stage rANS update with byte-level re-normalization reduces logic cost and memory traffic, while a prediction-guided decoding path speculatively narrows the cumulative distribution function (CDF) search window and safely falls back to maintain bit-exactness. A multi-lane organization scales throughput and enables fine-grained clock gating for efficient scheduling. On image workloads, our RTL-simulated prototype achieves 121.2x encode and 70.9x decode speedups over a Python rANS baseline, reducing average decoder binary-search steps from 7.00 to 3.15 (approximately 55% fewer). When paired with neural probability models, RAS sustains higher compression ratios than classical codecs and outperforms CPU/GPU rANS implementations, offering a practical approach to fast neural lossless compression.
