Variational image compression with a scale hyperprior
Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston
TL;DR
Ballé et al. present a variational image compression framework augmented with a scale hyperprior to model spatial dependencies in the latent representation. The hyperprior acts as learned side information, improving entropy coding by conditioning the latent distribution on an auxiliary z. Empirical results on Kodak and Tecnick show state-of-the-art MS-SSIM performance among neural methods and competitive PSNR, highlighting the importance of flexible priors in learned compression. The work demonstrates the impact of distortion metrics on perceptual quality and establishes a principled approach to incorporate side information into end-to-end neural codecs.
Abstract
We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
