Table of Contents
Fetching ...

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool

TL;DR

The paper presents a unified end-to-end framework for learning compressible representations by softly relaxing quantization and entropy and then annealing to hard quantization. It introduces soft-to-hard vector quantization with differentiable surrogates for quantization and entropy, using a histogram-based, nonparametric entropy estimate and a continuation-based annealing schedule. The method is demonstrated on image compression via a compressive autoencoder and on DNN compression using a ResNet, achieving competitive results with simpler training and fewer distributional assumptions. Key contributions include a differentiable soft quantizer, a soft entropy upper bound, and an end-to-end optimization of both network parameters and quantization centers to minimize the rate-distortion objective. This framework broadens the scope of end-to-end learned compression to jointly handle feature and parameter quantization across diverse tasks.

Abstract

We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks have typically been approached with different methods, our soft-to-hard quantization approach gives results competitive with the state-of-the-art for both.

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

TL;DR

The paper presents a unified end-to-end framework for learning compressible representations by softly relaxing quantization and entropy and then annealing to hard quantization. It introduces soft-to-hard vector quantization with differentiable surrogates for quantization and entropy, using a histogram-based, nonparametric entropy estimate and a continuation-based annealing schedule. The method is demonstrated on image compression via a compressive autoencoder and on DNN compression using a ResNet, achieving competitive results with simpler training and fewer distributional assumptions. Key contributions include a differentiable soft quantizer, a soft entropy upper bound, and an end-to-end optimization of both network parameters and quantization centers to minimize the rate-distortion objective. This framework broadens the scope of end-to-end learned compression to jointly handle feature and parameter quantization across diverse tasks.

Abstract

We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks have typically been approached with different methods, our soft-to-hard quantization approach gives results competitive with the state-of-the-art for both.

Paper Structure

This paper contains 20 sections, 13 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Top: MS-SSIM as a function of rate for SHA (Ours), BPG, JPEG 2000, JPEG, for each data set. Bottom: A visual example from the Kodak data set along with rate / MS-SSIM / SSIM / PSNR.
  • Figure 2: PSNR on ImageNET100 as a function of the rate for $2 \times 2$-dimensional centers (Vector), for $1 \times 1$-dimensional centers (Scalar), and for $2 \times 2$-dimensional centers without entropy loss ($\beta = 0)$. JPEG is included for reference.
  • Figure 3: Entropy loss for three $\beta$ values, soft and hard PSNR, as well as $\text{gap}(t)$ and $\sigma$ as a function of the iteration $t$.
  • Figure 4: Average MS-SSIM, SSIM, and PSNR as a function of the rate for the ImageNET100, Urban100, B100 and Kodak datasets.
  • Figure 5: We show how the sample entropy $H(p)$ decays during training, due to the entropy loss term in \ref{['eq:rd_tradeoff']}, and corresponding index histograms at three time instants. Top left: Evolution of the sample entropy $H(p)$. Top right: the histogram for the entropy $H=4.07$ at $t=216$. Bottom left and right: the corresponding sample histogram when $H(p)$ reaches $2.90$ bits per weight at $t=475$ and the final histogram for $H(p)=1.58$ bits per weight at $t=520$.
  • ...and 1 more figures