Table of Contents
Fetching ...

Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

Chang Sun, Hui Yuan, Shuai Li, Xin Lu, Raouf Hamzaoui

TL;DR

This paper tackles inefficiencies in learning-based point cloud geometry compression by addressing two issues: weak inter-context differences and an inaccurate learning target from one-hot labels coupled with cross-entropy. It introduces context feature residuals to amplify context differences and adds an MSE-based branch to provide accurate gradients, forming a general enhancement applicable to octree- and voxel-based models. Applied to OctAttention (as EMR-OctAttention) and VoxelDNN (as ER-VoxDNN), the approach yields notable bitrate reductions on MPEG 8i, MVUB, and SemanticKITTI, with acceptable increases in complexity. Overall, the proposed multi-loss, residual-context framework improves learning efficiency and compression performance in 3D point cloud geometry coding, with practical impact on both object and LiDAR datasets.

Abstract

In point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficult for the context model to accurately predict the probability distribution of node occupancy. Second, as the one-hot encoding is not the actual probability distribution of node occupancy, the cross-entropy loss function is inaccurate. To address these problems, we propose a general structure that can enhance existing context models. We introduce the context feature residuals into the context model to amplify the differences between contexts. We also add a multi-layer perception branch, that uses the mean squared error between its output and node occupancy as a loss function to provide accurate gradients in backpropagation. We validate our method by showing that it can improve the performance of an octree-based model (OctAttention) and a voxel-based model (VoxelDNN) on the object point cloud datasets MPEG 8i and MVUB, as well as the LiDAR point cloud dataset SemanticKITTI.

Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

TL;DR

This paper tackles inefficiencies in learning-based point cloud geometry compression by addressing two issues: weak inter-context differences and an inaccurate learning target from one-hot labels coupled with cross-entropy. It introduces context feature residuals to amplify context differences and adds an MSE-based branch to provide accurate gradients, forming a general enhancement applicable to octree- and voxel-based models. Applied to OctAttention (as EMR-OctAttention) and VoxelDNN (as ER-VoxDNN), the approach yields notable bitrate reductions on MPEG 8i, MVUB, and SemanticKITTI, with acceptable increases in complexity. Overall, the proposed multi-loss, residual-context framework improves learning efficiency and compression performance in 3D point cloud geometry coding, with practical impact on both object and LiDAR datasets.

Abstract

In point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficult for the context model to accurately predict the probability distribution of node occupancy. Second, as the one-hot encoding is not the actual probability distribution of node occupancy, the cross-entropy loss function is inaccurate. To address these problems, we propose a general structure that can enhance existing context models. We introduce the context feature residuals into the context model to amplify the differences between contexts. We also add a multi-layer perception branch, that uses the mean squared error between its output and node occupancy as a loss function to provide accurate gradients in backpropagation. We validate our method by showing that it can improve the performance of an octree-based model (OctAttention) and a voxel-based model (VoxelDNN) on the object point cloud datasets MPEG 8i and MVUB, as well as the LiDAR point cloud dataset SemanticKITTI.
Paper Structure (15 sections, 12 equations, 8 figures, 8 tables)

This paper contains 15 sections, 12 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Differences in context between adjacent nodes. Context models typically use decoded nodes as context, resulting in small differences in context between neighboring nodes.
  • Figure 2: Example of one-hot encoding based cross-entropy loss. Excluding the case where all eight child nodes are empty, there are 255 possible occupancy configurations of the eight child nodes, which can be represented using a 255-dimensional one-hot encoding. The 255-dimensional one-hot encoding is commonly used as a training label for octree-based context models.
  • Figure 3: Structure of OctAttention.
  • Figure 4: Overall architecture of the proposed structure. The structure consists of a feature extractor, main network, branch network, concatenate module and subtract module. Among them, the feature extractor and main network form the original context model. The subtract module is used to calculate context feature residuals and the concatenate module is used to concatenate the input of the network.
  • Figure 5: Overall architecture of EMR-OctAttention. The weighted context output from the attention layer is used as a latent representation to calculate the context feature residuals. The context feature residuals are concatenated with the weighted context and fed into two MLPs. One MLP outputs a 255-dimensional probability distribution and is the main network. The cross-entropy between this probability distribution and the one-hot encoding representing the actual occupancy of the node is used as the loss function. The other MLP outputs an 8-dimensional vector representing the occupancy probability of each child node. The mean squared error between this 8-dimensional vector and the actual occupancy of the 8 child nodes is used as the loss function.
  • ...and 3 more figures