AlphaZip: Neural Network-Enhanced Lossless Text Compression

Swathi Shree Narashiman; Nitin Chandrachoodan

AlphaZip: Neural Network-Enhanced Lossless Text Compression

Swathi Shree Narashiman, Nitin Chandrachoodan

TL;DR

A lossless text compression approach using a Large Language Model (LLM) that involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip.

Abstract

Data compression continues to evolve, with traditional information theory methods being widely used for compressing text, images, and videos. Recently, there has been growing interest in leveraging Generative AI for predictive compression techniques. This paper introduces a lossless text compression approach using a Large Language Model (LLM). The method involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip. Extensive analysis and benchmarking against conventional information-theoretic baselines demonstrate that neural compression offers improved performance.

AlphaZip: Neural Network-Enhanced Lossless Text Compression

TL;DR

Abstract

Paper Structure (23 sections, 3 equations, 5 figures, 13 tables)

This paper contains 23 sections, 3 equations, 5 figures, 13 tables.

Introduction
Background
Information Theoretic Baselines
Arithmetic Encoding
Huffman and Adaptive Huffman Encoding
Lempel Ziv 77 (LZ77)
Gzip compression algorithm
Brotli compression algorithm
Neural Compression Model Architecture
Tokenization
Rank Prediction
Individual Inference vs Batch Inference
Accelerating inferencing using TensorFlow XLA
Results and Analysis
Compression Performance Quantification
...and 8 more sections

Figures (5)

Figure 1: Block diagram representing standard compression pipeline
Figure 2: Block diagram representing the high level architecture of AlphaZip
Figure 3: Distribution of ranks vs order of ranks for two different test cases.
Figure 4: Compression Ratio vs Model Size
Figure 5: Time taken vs Model Size

AlphaZip: Neural Network-Enhanced Lossless Text Compression

TL;DR

Abstract

AlphaZip: Neural Network-Enhanced Lossless Text Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (5)