Table of Contents
Fetching ...

Robustly overfitting latents for flexible neural image compression

Yura Perugachi-Diaz, Arwin Gansekoele, Sandjai Bhulai

TL;DR

SGA+, which contains three different methods that build upon SGA, improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors and gives a detailed analysis of the proposed methods and shows that they are less sensitive to hyperparameter choices.

Abstract

Neural image compression has made a great deal of progress. State-of-the-art models are based on variational autoencoders and are outperforming classical models. Neural compression models learn to encode an image into a quantized latent representation that can be efficiently sent to the decoder, which decodes the quantized latent into a reconstructed image. While these models have proven successful in practice, they lead to sub-optimal results due to imperfect optimization and limitations in the encoder and decoder capacity. Recent work shows how to use stochastic Gumbel annealing (SGA) to refine the latents of pre-trained neural image compression models. We extend this idea by introducing SGA+, which contains three different methods that build upon SGA. We show how our method improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors. Additionally, we show how refinement of the latents with our best-performing method improves the compression performance on both the Tecnick and CLIC dataset. Our method is deployed for a pre-trained hyperprior and for a more flexible model. Further, we give a detailed analysis of our proposed methods and show that they are less sensitive to hyperparameter choices. Finally, we show how each method can be extended to three- instead of two-class rounding.

Robustly overfitting latents for flexible neural image compression

TL;DR

SGA+, which contains three different methods that build upon SGA, improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors and gives a detailed analysis of the proposed methods and shows that they are less sensitive to hyperparameter choices.

Abstract

Neural image compression has made a great deal of progress. State-of-the-art models are based on variational autoencoders and are outperforming classical models. Neural compression models learn to encode an image into a quantized latent representation that can be efficiently sent to the decoder, which decodes the quantized latent into a reconstructed image. While these models have proven successful in practice, they lead to sub-optimal results due to imperfect optimization and limitations in the encoder and decoder capacity. Recent work shows how to use stochastic Gumbel annealing (SGA) to refine the latents of pre-trained neural image compression models. We extend this idea by introducing SGA+, which contains three different methods that build upon SGA. We show how our method improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors. Additionally, we show how refinement of the latents with our best-performing method improves the compression performance on both the Tecnick and CLIC dataset. Our method is deployed for a pre-trained hyperprior and for a more flexible model. Further, we give a detailed analysis of our proposed methods and show that they are less sensitive to hyperparameter choices. Finally, we show how each method can be extended to three- instead of two-class rounding.
Paper Structure (50 sections, 18 equations, 17 figures, 9 tables)

This paper contains 50 sections, 18 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: Probability space for (a) Two-class rounding (b) Three-class rounding
  • Figure 2: Performance plots of (a) True R-D Loss (b) Difference in loss (c) PSNR (d) BPP.
  • Figure 3: R-D performance for SSL on (a) Kodak with the baselines, (b) Tecnick with the base model and $\operatorname{atanh}$ and (c) Kodak for semi-multi-rate behavior with $\operatorname{atanh}$. Best viewed electronically.
  • Figure 4: Qualitative comparison of a Kodak image from pre-trained model trained with $\lambda=0.0016$. Best viewed electronically.
  • Figure B.1: Comparison of $\operatorname{atanh}$ and SSL on the Kodak dataset for $t=\{500, 2000\}$ iterations.
  • ...and 12 more figures