Table of Contents
Fetching ...

From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression

Chaoyi Lin, Yaojun Wu, Yue Li, Junru Li, Kai Zhang, Li Zhang

TL;DR

The paper tackles the inefficiency of INR-based image compression by eliminating latent-code transmission and replacing explicit latent storage with latents generated directly from a shared Gaussian noise tensor. A lightweight Gaussian Parameter Predictor estimates per-pixel Gaussian parameters from the noise (with a reparameterization $y_{pred} = \mu_{pred} + \sigma_{pred} \cdot z_M$) to produce image-specific latents, which are then reconstructed by a synthesis network. The approach performs per-image overfitting and uses a seed-signaled, multi-scale noise pyramid to capture spatial priors, achieving competitive rate-distortion on Kodak and CLIC datasets while reducing decoding complexity compared to auto-regressive latent decoders. This work is the first to explore Gaussian latent generation from fixed noise for INR-based compression, offering a practical, lightweight alternative to current latent-code pipelines. Overall, the method demonstrates that generating latents from noise can preserve latent-based benefits without transmitting latent codes, with robust seed behavior and favorable decoding times.

Abstract

Recent implicit neural representation (INR)-based image compression methods have shown competitive performance by overfitting image-specific latent codes. However, they remain inferior to end-to-end (E2E) compression approaches due to the absence of expressive latent representations. On the other hand, E2E methods rely on transmitting latent codes and requiring complex entropy models, leading to increased decoding complexity. Inspired by the normalization strategy in E2E codecs where latents are transformed into Gaussian noise to demonstrate the removal of spatial redundancy, we explore the inverse direction: generating latents directly from Gaussian noise. In this paper, we propose a novel image compression paradigm that reconstructs image-specific latents from a multi-scale Gaussian noise tensor, deterministically generated using a shared random seed. A Gaussian Parameter Prediction (GPP) module estimates the distribution parameters, enabling one-shot latent generation via reparameterization trick. The predicted latent is then passed through a synthesis network to reconstruct the image. Our method eliminates the need to transmit latent codes while preserving latent-based benefits, achieving competitive rate-distortion performance on Kodak and CLIC dataset. To the best of our knowledge, this is the first work to explore Gaussian latent generation for learned image compression.

From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression

TL;DR

The paper tackles the inefficiency of INR-based image compression by eliminating latent-code transmission and replacing explicit latent storage with latents generated directly from a shared Gaussian noise tensor. A lightweight Gaussian Parameter Predictor estimates per-pixel Gaussian parameters from the noise (with a reparameterization ) to produce image-specific latents, which are then reconstructed by a synthesis network. The approach performs per-image overfitting and uses a seed-signaled, multi-scale noise pyramid to capture spatial priors, achieving competitive rate-distortion on Kodak and CLIC datasets while reducing decoding complexity compared to auto-regressive latent decoders. This work is the first to explore Gaussian latent generation from fixed noise for INR-based compression, offering a practical, lightweight alternative to current latent-code pipelines. Overall, the method demonstrates that generating latents from noise can preserve latent-based benefits without transmitting latent codes, with robust seed behavior and favorable decoding times.

Abstract

Recent implicit neural representation (INR)-based image compression methods have shown competitive performance by overfitting image-specific latent codes. However, they remain inferior to end-to-end (E2E) compression approaches due to the absence of expressive latent representations. On the other hand, E2E methods rely on transmitting latent codes and requiring complex entropy models, leading to increased decoding complexity. Inspired by the normalization strategy in E2E codecs where latents are transformed into Gaussian noise to demonstrate the removal of spatial redundancy, we explore the inverse direction: generating latents directly from Gaussian noise. In this paper, we propose a novel image compression paradigm that reconstructs image-specific latents from a multi-scale Gaussian noise tensor, deterministically generated using a shared random seed. A Gaussian Parameter Prediction (GPP) module estimates the distribution parameters, enabling one-shot latent generation via reparameterization trick. The predicted latent is then passed through a synthesis network to reconstruct the image. Our method eliminates the need to transmit latent codes while preserving latent-based benefits, achieving competitive rate-distortion performance on Kodak and CLIC dataset. To the best of our knowledge, this is the first work to explore Gaussian latent generation for learned image compression.

Paper Structure

This paper contains 19 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The E2E compression methods transform Gaussian latents $y \sim \mathcal{N}(\mu, \sigma^2)$ to noise $z' = (y - \mu)/\sigma$ to demonstrate the effectiveness of spatial redundancy removal. We explore the inverse process: starting from randomly sampled Gaussian noise $z \sim \mathcal{N}(0, I)$, we estimate the mean $\mu_{\text{pred}}$ and scale $\sigma_{\text{pred}}$, and reconstruct the latent as $y_{\text{pred}} = z \cdot \sigma_{\text{pred}} + \mu_{\text{pred}}$.
  • Figure 2: Overview of the proposed framework. $UP$ and $C$ denote upsampling and concatenation, respectively. Colored modules are included in the bitstream: the random seed, and the parameters of the GPP and synthesis networks.
  • Figure 3: Rate-Distortion curves on Kodak and CLIC datasets.
  • Figure 4: Visualization of key components in our framework. We extract one channel from the input noise and its corresponding channel from the generated latent for illustration. Best viewed in screen.
  • Figure 5: Rate-Distortion performance for ablation studies.
  • ...and 1 more figures