Table of Contents
Fetching ...

Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

Chang Sun, Hui Yuan, Xiaolong Mao, Xin Lu, Raouf Hamzaoui

TL;DR

This work targets efficient lossless geometry compression for 3D point clouds by addressing a mismatch in common octree-based context models: cross-entropy losses treat node occupancy as a 255-class classification task, but the problem also involves predicting the number of occupied child nodes, a regression task. To bridge this gap, the authors introduce an Attention-based Child Node Number Prediction (ACNP) module that predicts the number of occupied child nodes and encodes it as an 8-dimensional vector, which is fused into the context model to refine the occupancy probability distribution $P_i = G(\boldsymbol{c_i}, \boldsymbol{V_i}; \boldsymbol{\omega})$. ACNP is designed as a general enhancement and is applied to OctAttention and OctSqueeze, yielding notable bitrate reductions on MVUB, MPEG 8i, and SemanticKITTI datasets, thereby improving coding efficiency in octree-based lossless compression. However, ACNP increases model size and decoding/encoding times, highlighting a trade-off between performance gains and computational cost and pointing to future work on complexity reduction and broader applicability.

Abstract

In point cloud geometry compression, most octreebased context models use the cross-entropy between the onehot encoding of node occupancy and the probability distribution predicted by the context model as the loss. This approach converts the problem of predicting the number (a regression problem) and the position (a classification problem) of occupied child nodes into a 255-dimensional classification problem. As a result, it fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. We first analyze why the cross-entropy loss function fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. Then, we propose an attention-based child node number prediction (ACNP) module to enhance the context models. The proposed module can predict the number of occupied child nodes and map it into an 8- dimensional vector to assist the context model in predicting the probability distribution of the occupancy of the current node for efficient entropy coding. Experimental results demonstrate that the proposed module enhances the coding efficiency of octree-based context models.

Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

TL;DR

This work targets efficient lossless geometry compression for 3D point clouds by addressing a mismatch in common octree-based context models: cross-entropy losses treat node occupancy as a 255-class classification task, but the problem also involves predicting the number of occupied child nodes, a regression task. To bridge this gap, the authors introduce an Attention-based Child Node Number Prediction (ACNP) module that predicts the number of occupied child nodes and encodes it as an 8-dimensional vector, which is fused into the context model to refine the occupancy probability distribution . ACNP is designed as a general enhancement and is applied to OctAttention and OctSqueeze, yielding notable bitrate reductions on MVUB, MPEG 8i, and SemanticKITTI datasets, thereby improving coding efficiency in octree-based lossless compression. However, ACNP increases model size and decoding/encoding times, highlighting a trade-off between performance gains and computational cost and pointing to future work on complexity reduction and broader applicability.

Abstract

In point cloud geometry compression, most octreebased context models use the cross-entropy between the onehot encoding of node occupancy and the probability distribution predicted by the context model as the loss. This approach converts the problem of predicting the number (a regression problem) and the position (a classification problem) of occupied child nodes into a 255-dimensional classification problem. As a result, it fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. We first analyze why the cross-entropy loss function fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. Then, we propose an attention-based child node number prediction (ACNP) module to enhance the context models. The proposed module can predict the number of occupied child nodes and map it into an 8- dimensional vector to assist the context model in predicting the probability distribution of the occupancy of the current node for efficient entropy coding. Experimental results demonstrate that the proposed module enhances the coding efficiency of octree-based context models.
Paper Structure (10 sections, 8 equations, 2 figures, 4 tables)

This paper contains 10 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Problem of cross-entropy loss. The occupancy of the child nodes can be represented in binary as 00000010, which is equivalent to the decimal value 2. Therefore, the training label is a one-hot encoding with a value of 1 in the second dimension and 0 in all other dimensions. $\boldsymbol{A}$ and $\boldsymbol{B}$ are two probability distributions predicted by some context models. The cross-entropy loss between $\boldsymbol{A}$ and the label and that between $\boldsymbol{B}$ and the label are identical.
  • Figure 2: Overall architecture of the ACNP module. The ACNP module consists of attention layers and MLP layers. The ACNP module takes the context as input and outputs a vector containing information about the number of occupied child nodes. We divide the context model into two stages: feature extraction and feature aggregation. The dimension of features gradually decreases in the feature aggregation stage. The vector output by the ACNP module is concatenated with the output of the feature extraction stage and fed into the feature aggregation stage. The output of the context model is a 255-dimensional probability distribution used for entropy coding.