LoPace: A Lossless Optimized Prompt Accurate Compression Engine for Large Language Model Applications
Aman Ulla
TL;DR
LoPace tackles the challenge of storing prompts for production-grade LLM systems by introducing a lossless, three-track compression engine. It combines a Zstandard-based codec, a token-based BPE approach, and a novel hybrid that pipelines tokenization with Zstd, achieving a mean space savings of 72.2% and a mean compression ratio of 4.89x while maintaining 100% lossless reconstruction across 1,158 compression–decompression cycles. The work provides rigorous empirical validation on a diverse 386-prompt dataset, detailed methodological reasoning, and practical deployment guidance, demonstrating production-ready throughput and sub-linear memory scaling. Its results offer a concrete path to substantial storage cost reductions and improved system scalability for real-time and large-scale LLM applications, with open-source accessibility for adoption and extension.
Abstract
Large Language Models (LLMs) have changed the way natural language processing works, but it is still hard to store and manage prompts efficiently in production environments. This paper presents LoPace (Lossless Optimized Prompt Accurate Compression Engine), a novel compression framework designed specifically for prompt storage in LLM applications. LoPace uses three different ways to compress data: Zstandard-based compression, Byte-Pair Encoding (BPE) tokenization with binary packing, and a hybrid method that combines the two. We show that LoPace saves an average of 72.2\% of space while still allowing for 100\% lossless reconstruction by testing it on 386 different prompts, such as code snippets, markdown documentation, and structured content. The hybrid method always works better than each technique on its own. It gets mean compression ratios of 4.89x (range: 1.22--19.09x) and speeds of 3.3--10.7 MB/s. Our findings show that LoPace is ready for production, with a small memory footprint (0.35 MB on average) and great scalability for big databases and real-time LLM apps.
