Learned Compression for Images and Point Clouds

Mateen Ulhaq

Learned Compression for Images and Point Clouds

Mateen Ulhaq

TL;DR

This thesis provides an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information and proposes a novel lightweight low-complexity point cloud codec.

Abstract

Over the last decade, deep learning has shown great success at performing computer vision tasks, including classification, super-resolution, and style transfer. Now, we apply it to data compression to help build the next generation of multimedia codecs. This thesis provides three primary contributions to this new field of learned compression. First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information. Secondly, we propose a novel lightweight low-complexity point cloud codec that is highly specialized for classification, attaining significant reductions in bitrate compared to non-specialized codecs. Lastly, we explore how motion within the input domain between consecutive video frames is manifested in the corresponding convolutionally-derived latent space.

Learned Compression for Images and Point Clouds

TL;DR

Abstract

Paper Structure (42 sections, 42 equations, 22 figures, 6 tables)

This paper contains 42 sections, 42 equations, 22 figures, 6 tables.

Introduction
Data compression: an example
Learning-based compression: the current landscape
Compression architecture overview
Entropy modeling
Thesis outline and contributions
Compression of probability distributions
Introduction
Related works
Proposed method
Compression of probability distributions
Architecture overview
Histogram estimation
Loss function
Optimization
...and 27 more sections

Figures (22)

Figure 1: RD curves for image compression codecs on the Kodak dataset kodak_dataset.
Figure 2: High-level comparison of codec architectures.
Figure 3: Visualization of an encoding distribution used for compressing a single element $\hat{y}_i$.
Figure 4: Visualization of encoding distributions used for compressing a latent tensor ${\boldsymbol{\hat{y}}}$ with dimensions $M_y \times H_y \times W_y$. In (a), the encoding distributions within a given channel are all the same since the elements within a channel are assumed to be i.i.d. w.r.t. each other. Furthermore, in the case of the fully factorized entropy bottleneck used by Ballé et al.balle2018variational, each encoding distribution is a static non-parametric distribution. In (b), the encoding distributions for each element are uniquely determined, and conditioned on side information. Furthermore, in the case of the Gaussian conditional hyperprior used by Ballé et al.balle2018variational, the encoding distributions are Gaussian distributions parameterized by a mean and variance.
Figure 5: Visualization of the suboptimality of using a single static encoding distribution. This distribution tries to "average" (in an amortized information-theoretic sense) all the best possible data-specific distributions. However, the resulting distribution is less optimal than data-specific distributions.
...and 17 more figures

Learned Compression for Images and Point Clouds

TL;DR

Abstract

Learned Compression for Images and Point Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (22)