Table of Contents
Fetching ...

Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation

Daxin Li, Yuanchao Bai, Kai Wang, Wenbo Zhao, Junjun Jiang, Xianming Liu

TL;DR

This work reframes autoregressive models for lossless image compression by introducing HPAC, a lightweight Hierarchical Parallel Autoregressive ConvNet, augmented with Cache-then-Select Inference (CSI) and Adaptive Focus Coding (AFC) for fast, high-bit-depth coding. It then enables efficient, instance-specific adaptation through Spatially-Aware Rate-Guided Progressive Fine-tuning (SARP-FT), grounded in the Minimum Description Length principle and low-rank adapters. Across diverse datasets, HPAC achieves state-of-the-art compression with a fraction of the parameters of prior methods and offers practical speed, while SARP-FT delivers substantial per-image gains with modest computational cost. The combination demonstrates that a carefully engineered AR framework can rival or exceed existing learned compression approaches in both rate and practicality, enabling universal, per-image adapted lossless compression.

Abstract

Autoregressive (AR) models, the theoretical performance benchmark for learned lossless image compression, are often dismissed as impractical due to prohibitive computational cost. This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation that re-establishes pure autoregression as a top-performing and practical solution. Our approach is embodied in the Hierarchical Parallel Autoregressive ConvNet (HPAC), an ultra-lightweight pre-trained model using a hierarchical factorized structure and content-aware convolutional gating to efficiently capture spatial dependencies. We introduce two key optimizations for practicality: Cache-then-Select Inference (CSI), which accelerates coding by eliminating redundant computations, and Adaptive Focus Coding (AFC), which efficiently extends the framework to high bit-depth images. Building on this efficient foundation, our progressive adaptation strategy is realized by Spatially-Aware Rate-Guided Progressive Fine-tuning (SARP-FT). This instance-level strategy fine-tunes the model for each test image by optimizing low-rank adapters on progressively larger, spatially-continuous regions selected via estimated information density. Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression. Notably, our approach sets a new benchmark in learned lossless compression, showing a carefully designed AR framework can offer significant gains over existing methods with a small parameter count and competitive coding speeds.

Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation

TL;DR

This work reframes autoregressive models for lossless image compression by introducing HPAC, a lightweight Hierarchical Parallel Autoregressive ConvNet, augmented with Cache-then-Select Inference (CSI) and Adaptive Focus Coding (AFC) for fast, high-bit-depth coding. It then enables efficient, instance-specific adaptation through Spatially-Aware Rate-Guided Progressive Fine-tuning (SARP-FT), grounded in the Minimum Description Length principle and low-rank adapters. Across diverse datasets, HPAC achieves state-of-the-art compression with a fraction of the parameters of prior methods and offers practical speed, while SARP-FT delivers substantial per-image gains with modest computational cost. The combination demonstrates that a carefully engineered AR framework can rival or exceed existing learned compression approaches in both rate and practicality, enabling universal, per-image adapted lossless compression.

Abstract

Autoregressive (AR) models, the theoretical performance benchmark for learned lossless image compression, are often dismissed as impractical due to prohibitive computational cost. This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation that re-establishes pure autoregression as a top-performing and practical solution. Our approach is embodied in the Hierarchical Parallel Autoregressive ConvNet (HPAC), an ultra-lightweight pre-trained model using a hierarchical factorized structure and content-aware convolutional gating to efficiently capture spatial dependencies. We introduce two key optimizations for practicality: Cache-then-Select Inference (CSI), which accelerates coding by eliminating redundant computations, and Adaptive Focus Coding (AFC), which efficiently extends the framework to high bit-depth images. Building on this efficient foundation, our progressive adaptation strategy is realized by Spatially-Aware Rate-Guided Progressive Fine-tuning (SARP-FT). This instance-level strategy fine-tunes the model for each test image by optimizing low-rank adapters on progressively larger, spatially-continuous regions selected via estimated information density. Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression. Notably, our approach sets a new benchmark in learned lossless compression, showing a carefully designed AR framework can offer significant gains over existing methods with a small parameter count and competitive coding speeds.

Paper Structure

This paper contains 36 sections, 23 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of the proposed architectures and mechanisms in HPAC: (a) The content-adaptive Convolutional Gating Mechanism (CGM) simplifies attention. (b) Building on this, the Hierarchical Parallel Autoregressive ConvNet (HPAC) consists of Local Context Modulator (LCM), MLP, and Spatial Propagation Module (SPM) blocks. (c) To accelerate coding, Cache-then-Select Inference (CSI) caches activations and performs efficient selective computation only on causally relevant features.
  • Figure 2: Illustration of the proposed hierarchical context modeling strategy. The context is aggregated from a small neighborhood of adjacent patches, and the context for all patches is computed in parallel.
  • Figure 3: The sparse distribution of pixel values in a sample high bit-depth image from Covid-CT dataset. This distribution is highly skewed, with most pixels concentrated in a narrow range, and only a small fraction of pixels are outliers.
  • Figure 4: (a) (b) The wassertstein distance between all patch pairs for image Farmland-49.png in RS19 dataset, showing the content redundancy across patches. (c) (d) The bpsp gain analysis comparison of fine-tuning on different patches with 10 steps for cropped image avide-ragusa-716.png in CLIC.p dataset. The bpsp gain is calculated as the difference in bpsp between the fine-tuned model and the pre-trained model.
  • Figure 5: The bpsp savings of SARP-FT over pre-trained HPAC on Kodak and Doc24 datasets. The first row shows kodim23.png in Kodak dataset. The second row shows 24.png in Doc24 dataset.
  • ...and 3 more figures