Table of Contents
Fetching ...

Lossy Image Compression with Compressive Autoencoders

Lucas Theis, Wenzhe Shi, Andrew Cunningham, Ferenc Huszár

TL;DR

This paper addresses the need for flexible, high-performance lossy image compression by introducing compressive autoencoders (CAEs) trained end-to-end. It tackles the non-differentiability of quantization with a differentiable gradient surrogate and a differentiable upper bound on the entropy term, enabling direct optimization of the rate-distortion objective $- ext{log}_2 Q([f(\mathbf{x})]) + \beta\, d(\mathbf{x}, g([f(\mathbf{x})]))$. The authors demonstrate competitive performance to JPEG 2000 on natural images, with superior perceptual metrics such as SSIM and MOS, and achieve efficient large-scale decoding through a sub-pixel architecture and GSM-based entropy modeling. A key contribution is an incremental training procedure and scalable rate control via learnable scale parameters, allowing adaptation to multiple bitrates without retraining from scratch. The work lays a foundation for end-to-end, content-specific compression and suggests avenues for incorporating perceptual metrics or GAN-based enhancements for further improvements.

Abstract

We propose a new approach to the problem of optimizing autoencoders for lossy image compression. New media formats, changing hardware technology, as well as diverse requirements and content types create a need for compression algorithms which are more flexible than existing codecs. Autoencoders have the potential to address this need, but are difficult to optimize directly due to the inherent non-differentiabilty of the compression loss. We here show that minimal changes to the loss are sufficient to train deep autoencoders competitive with JPEG 2000 and outperforming recently proposed approaches based on RNNs. Our network is furthermore computationally efficient thanks to a sub-pixel architecture, which makes it suitable for high-resolution images. This is in contrast to previous work on autoencoders for compression using coarser approximations, shallower architectures, computationally expensive methods, or focusing on small images.

Lossy Image Compression with Compressive Autoencoders

TL;DR

This paper addresses the need for flexible, high-performance lossy image compression by introducing compressive autoencoders (CAEs) trained end-to-end. It tackles the non-differentiability of quantization with a differentiable gradient surrogate and a differentiable upper bound on the entropy term, enabling direct optimization of the rate-distortion objective . The authors demonstrate competitive performance to JPEG 2000 on natural images, with superior perceptual metrics such as SSIM and MOS, and achieve efficient large-scale decoding through a sub-pixel architecture and GSM-based entropy modeling. A key contribution is an incremental training procedure and scalable rate control via learnable scale parameters, allowing adaptation to multiple bitrates without retraining from scratch. The work lays a foundation for end-to-end, content-specific compression and suggests avenues for incorporating perceptual metrics or GAN-based enhancements for further improvements.

Abstract

We propose a new approach to the problem of optimizing autoencoders for lossy image compression. New media formats, changing hardware technology, as well as diverse requirements and content types create a need for compression algorithms which are more flexible than existing codecs. Autoencoders have the potential to address this need, but are difficult to optimize directly due to the inherent non-differentiabilty of the compression loss. We here show that minimal changes to the loss are sufficient to train deep autoencoders competitive with JPEG 2000 and outperforming recently proposed approaches based on RNNs. Our network is furthermore computationally efficient thanks to a sub-pixel architecture, which makes it suitable for high-resolution images. This is in contrast to previous work on autoencoders for compression using coarser approximations, shallower architectures, computationally expensive methods, or focusing on small images.

Paper Structure

This paper contains 18 sections, 16 equations, 13 figures.

Figures (13)

  • Figure 1: Effects of rounding and differentiable alternatives when used as replacements in JPEG compression. A: A crop of an image before compression GoToVan:2014. B: Blocking artefacts in JPEG are caused by rounding of DCT coefficients to the nearest integer. Since rounding is used at test time, a good approximation should produce similar artefacts. C: Stochastic rounding to the nearest integer similar to the binarization of Toderici:2016a. D: Uniform additive noise Balle:2016.
  • Figure 2: Illustration of the compressive autoencoder architecture used in this paper. Inspired by the work of Shi:2016, most convolutions are performed in a downsampled space to speed up computation, and upsampling is performed using sub-pixel convolutions (convolutions followed by reshaping/reshuffling of the coefficients). To reduce clutter, only two residual blocks of the encoder and the decoder are shown. Convolutions followed by leaky rectifications are indicated by solid arrows, while transparent arrows indicate absence of additional nonlinearities. As a model for the distributions of quantized coefficients we use Gaussian scale mixtures. The notation $C \times K \times K$ refers to $K \times K$ convolutions with $C$ filters. The number following the slash indicates stride in the case of convolutions, and upsampling factors in the case of sub-pixel convolutions.
  • Figure 3: A: Scale parameters obtained by finetuning a compressive autoencoder (blue). More fine-grained control over bit rates can be achieved by interpolating scales (gray). Each dot corresponds to the scale parameter of one coefficient for a particular rate-distortion trade-off. The coefficients are ordered due to the incremental training procedure. B: Comparison of incremental training versus non-incremental training. The learning rate was decreased after 116,000 iterations (bottom two lines). Non-incremental training is initially less stable and shows worse performance at later iterations. Using a small learning rate from the beginning stabilizes non-incremental training but is considerably slower (top line).
  • Figure 4: Comparison of different compression algorithms with respect to PSNR, SSIM, and MS-SSIM on the Kodak PhotoCD image dataset. We note that the blue line refers to the results of Toderici:2016b achieved without entropy encoding.
  • Figure 5: Closeups of images produced by different compression algorithms at relatively low bit rates. The second row shows an example where our method performs well, producing sharper lines than and fewer artefacts than other methods. The fourth row shows an example where our method struggles, producing noticeable artefacts in the hair and discolouring the skin. At higher bit rates, these problems disappear and CAE reconstructions appear sharper than those of JPEG 2000 (fifth row). Complete images are provided in Appendix \ref{['sec:details']}.
  • ...and 8 more figures