Table of Contents
Fetching ...

Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Kecheng Chen, Pingping Zhang, Hui Liu, Jie Liu, Yibing Liu, Jiaxin Huang, Shiqi Wang, Hong Yan, Haoliang Li

TL;DR

This work proposes P$2-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, e.g., pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions.

Abstract

We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P$^{2}$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P$^{2}$-LLM can beat SOTA classical and learned codecs.

Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

TL;DR

This work proposes P$2-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, e.g., pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions.

Abstract

We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P-LLM can beat SOTA classical and learned codecs.

Paper Structure

This paper contains 13 sections, 2 theorems, 14 equations, 6 figures, 5 tables.

Key Result

Theorem 1

li2024understandinggrau2024learning For any parametric meta-learning model $f_{\theta}$ like the decoder-only large models, if $f_{\theta}$ is fully trained by log-loss function and consider an infinite sequence $\omega$ of events over a finite alphabet, the optimum posterior distribution $\mu$ of $

Figures (6)

  • Figure 1: Comparison of different lossless image compressors for bit-per-subpixel (bpsp$\downarrow$) on CLIC.m dataset. Classical compressors include PNG, WebP, FLIF, and JPEG-XL.
  • Figure 2: The framework of the proposed P$^{2}$-LLM, including Pixel Prediction Chat Template for the pixel-level priors and in-context integration in sec. \ref{['3.1']}, Two-step Lossless Pixel Tokenization for pixel-level semantic preservation in sec. \ref{['3.2']}, Predictive Distribution Sampling for scalable probability representation of encoded symbols in sec. \ref{['3.3']}, and Fine-tuning to boost the understanding capacity of pixel sequences in sec. \ref{['3.4']}. You may zoom in for a better view.
  • Figure 3: Comparison of different input sequences for pixel prediction. The existing approach is from deletang2024language. We process each patch of RGB images in a channel-joint manner.
  • Figure 4: (a) Customized Tokenizer deletang2024language (b) Lossless Image Tokenizer used by ours. Note that the subpixel values, 254 and 255, correspond to the same token ID (i.e., 1624) in (a).
  • Figure 5: Visualization of predictive distribution sampling for reasoning three subpixels using LLM. You may zoom in for a better view.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Corollary 1