Table of Contents
Fetching ...

COLI: A Hierarchical Efficient Compressor for Large Images

Haoran Wang, Hanyu Pei, Yang Lyu, Kai Zhang, Li Li, Feng-Lei Fan

Abstract

The escalating adoption of high-resolution, large-field-of-view imagery amplifies the need for efficient compression methodologies. Conventional techniques frequently fail to preserve critical image details, while data-driven approaches exhibit limited generalizability. Implicit Neural Representations (INRs) present a promising alternative by learning continuous mappings from spatial coordinates to pixel intensities for individual images, thereby storing network weights rather than raw pixels and avoiding the generalization problem. However, INR-based compression of large images faces challenges including slow compression speed and suboptimal compression ratios. To address these limitations, we introduce COLI (Compressor for Large Images), a novel framework leveraging Neural Representations for Videos (NeRV). First, recognizing that INR-based compression constitutes a training process, we accelerate its convergence through a pretraining-finetuning paradigm, mixed-precision training, and reformulation of the sequential loss into a parallelizable objective. Second, capitalizing on INRs' transformation of image storage constraints into weight storage, we implement Hyper-Compression, a novel post-training technique to substantially enhance compression ratios while maintaining minimal output distortion. Evaluations across two medical imaging datasets demonstrate that COLI consistently achieves competitive or superior PSNR and SSIM metrics at significantly reduced bits per pixel (bpp), while accelerating NeRV training by up to 4 times.

COLI: A Hierarchical Efficient Compressor for Large Images

Abstract

The escalating adoption of high-resolution, large-field-of-view imagery amplifies the need for efficient compression methodologies. Conventional techniques frequently fail to preserve critical image details, while data-driven approaches exhibit limited generalizability. Implicit Neural Representations (INRs) present a promising alternative by learning continuous mappings from spatial coordinates to pixel intensities for individual images, thereby storing network weights rather than raw pixels and avoiding the generalization problem. However, INR-based compression of large images faces challenges including slow compression speed and suboptimal compression ratios. To address these limitations, we introduce COLI (Compressor for Large Images), a novel framework leveraging Neural Representations for Videos (NeRV). First, recognizing that INR-based compression constitutes a training process, we accelerate its convergence through a pretraining-finetuning paradigm, mixed-precision training, and reformulation of the sequential loss into a parallelizable objective. Second, capitalizing on INRs' transformation of image storage constraints into weight storage, we implement Hyper-Compression, a novel post-training technique to substantially enhance compression ratios while maintaining minimal output distortion. Evaluations across two medical imaging datasets demonstrate that COLI consistently achieves competitive or superior PSNR and SSIM metrics at significantly reduced bits per pixel (bpp), while accelerating NeRV training by up to 4 times.

Paper Structure

This paper contains 25 sections, 1 theorem, 7 equations, 14 figures, 5 tables.

Key Result

Theorem 1

Let $a_1, \ldots, a_N$ be irrationally independent real numbers. Then for any vector $\{w_n\}_{n=1}^N \subset [0,1]$ and any $\epsilon > 0$, there exists a scalar $\theta^* \in [0, \infty)$ such that: where $\tau(z) = z - \lfloor z \rfloor$.

Figures (14)

  • Figure 1: The schematic diagram of the COLI framework. NeRV encodes a large image into weights, which are flattened and reshaped into a matrix with the size of $(G,2)$; Hyper-Compression maps each row to one hyperparameter $\theta^{*}$ for storage and later reconstruction.
  • Figure 2: The overall structure of the NeRV model. The input patches are firstly embedded, then passed through several fully connected layers and a sequence of NeRV blocks, and finally decoded into frame outputs.
  • Figure 3: Illustration of the proposed three-stage acceleration strategy for NeRV training. Left: Reducing training epochs via pretrained model initialization. Middle: Reducing time per epoch using AMP, unified metric computation, and batch-wise loss optimization. Right: Parallel training of multiple NeRV models to fully utilize GPU resources.
  • Figure 4: NeRV encodes images as neural network parameters, which enables to lend post-training model compression techniques to image compresion.
  • Figure 5: A comparison of image reconstruction performance of the NeRV model under different compression strategies.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Theorem 1: Katok et al., 1995