End-to-end learned Lossy Dynamic Point Cloud Attribute Compression

Dat Thanh Nguyen; Daniel Zieger; Marc Stamminger; Andre Kaup

End-to-end learned Lossy Dynamic Point Cloud Attribute Compression

Dat Thanh Nguyen, Daniel Zieger, Marc Stamminger, Andre Kaup

TL;DR

This work tackles dynamic point cloud attribute compression by proposing an end-to-end learned framework based on a variational autoencoder that encodes attributes into a latent variable $f$ and uses a spatiotemporal auto-regressive context for entropy coding. The method jointly optimizes a rate-distortion objective and models the latent prior with auto-regressive and temporal dependencies, enabling efficient bitstream encoding. With experiments on MPEG 8i and MVUB datasets, it reports substantial BD-rate savings of $38.1\%$ and ~1.44 dB BD-quality gains at the same bitrate, while maintaining low encoding/decoding complexity compared with the RAHT-based MPEG G-PCC core module. The results demonstrate strong potential for end-to-end learned dynamic attribute coding and indicate avenues for extending to multiple modalities and further reducing complexity.

Abstract

Recent advancements in point cloud compression have primarily emphasized geometry compression while comparatively fewer efforts have been dedicated to attribute compression. This study introduces an end-to-end learned dynamic lossy attribute coding approach, utilizing an efficient high-dimensional convolution to capture extensive inter-point dependencies. This enables the efficient projection of attribute features into latent variables. Subsequently, we employ a context model that leverage previous latent space in conjunction with an auto-regressive context model for encoding the latent tensor into a bitstream. Evaluation of our method on widely utilized point cloud datasets from the MPEG and Microsoft demonstrates its superior performance compared to the core attribute compression module Region-Adaptive Hierarchical Transform method from MPEG Geometry Point Cloud Compression with 38.1% Bjontegaard Delta-rate saving in average while ensuring a low-complexity encoding/decoding.

End-to-end learned Lossy Dynamic Point Cloud Attribute Compression

TL;DR

This work tackles dynamic point cloud attribute compression by proposing an end-to-end learned framework based on a variational autoencoder that encodes attributes into a latent variable

and uses a spatiotemporal auto-regressive context for entropy coding. The method jointly optimizes a rate-distortion objective and models the latent prior with auto-regressive and temporal dependencies, enabling efficient bitstream encoding. With experiments on MPEG 8i and MVUB datasets, it reports substantial BD-rate savings of

and ~1.44 dB BD-quality gains at the same bitrate, while maintaining low encoding/decoding complexity compared with the RAHT-based MPEG G-PCC core module. The results demonstrate strong potential for end-to-end learned dynamic attribute coding and indicate avenues for extending to multiple modalities and further reducing complexity.

Abstract

Paper Structure (10 sections, 7 equations, 3 figures, 1 table)

This paper contains 10 sections, 7 equations, 3 figures, 1 table.

Introduction
Related work
Proposed method
Auto-Encoder and Rate-Distortion Optimization
Network Architecture
Encoding and Decoding
Experimental Results
Experimental Setup
Experimental Results
Conclusions

Figures (3)

Figure 1: System overview of the proposed method. The Encoder encodes each frame into a latent variable $f$ before a quantization step. The quantized latent variables are encoded by an Adaptive Arithmetic Encoder (AAE) using the probability distribution model from the spatiotemporal Context Model. At the decoder side, the quantized latent variables are decoded from the bitstream using the same context model and then fed into the Decoder to reconstruct the lossy feature F.
Figure 2: Detail the network architecture of the proposed method. $Conv$ denotes sparse convolution choy20194d, with the number of filters, convolutional kernel, and stride denoted using $fLkKsS$ (where $L$ represents filters, $K$ represents kernel, and $S$ represents stride). The Probability Context Model outputs a pair of mean $\mu$ and standard deviation $\sigma$ of the Gaussian distribution. The architecture of the Decoder mirrors the Encoder with sparse transpose convolution.
Figure 3: Visual comparison between RAHT ((a) and (c)) and the proposed method ((b) and (d)) at a similar bitrate of point cloud $Loot$ and $Ricardo$. The darker distortion map is, the better quality. Please note that for better visualization, we lower the max value of the color map to emphasize the distortions.

End-to-end learned Lossy Dynamic Point Cloud Attribute Compression

TL;DR

Abstract

End-to-end learned Lossy Dynamic Point Cloud Attribute Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (3)