Table of Contents
Fetching ...

SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds

Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong

TL;DR

This is the first instance that a learning-based attribute codec outperforms the G-PCC standard on these datasets by following the common test conditions specified by MPEG.

Abstract

We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the latest G-PCC test model (TMC13v23) under the MPEG common test conditions (CTCs), the proposed method achieved an average Bjontegaard delta bitrate reduction of 24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG Category Dense dataset. This is the first instance of a learning-based codec outperforming the G-PCC standard on these datasets under the MPEG CTCs.

SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds

TL;DR

This is the first instance that a learning-based attribute codec outperforms the G-PCC standard on these datasets by following the common test conditions specified by MPEG.

Abstract

We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the latest G-PCC test model (TMC13v23) under the MPEG common test conditions (CTCs), the proposed method achieved an average Bjontegaard delta bitrate reduction of 24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG Category Dense dataset. This is the first instance of a learning-based codec outperforming the G-PCC standard on these datasets under the MPEG CTCs.
Paper Structure (21 sections, 24 equations, 13 figures, 10 tables)

This paper contains 21 sections, 24 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: SPAC architecture. The input point cloud is processed through a Frequency Sampling (FS) module, resulting in sampled point clouds $\bm{\mathcal{P}}_1, \bm{\mathcal{P}}_2, \bm{\mathcal{P}}_3, \bm{\mathcal{P}}_4$. Residual point clouds $\bm{\mathcal{P}}_{1,\text{res}}, \bm{\mathcal{P}}_{2,\text{res}}, \bm{\mathcal{P}}_{3,\text{res}}$ are then obtained through set difference. The residual point clouds are further partitioned using octree and fed into Feature Extraction Networks (FNet-1, FNet-2, FNet-3, FNet-4) for feature extraction. Next, the extracted features are encoded using entropy coding, where the entropy encoder of the deepest layer incorporates a hyperprior entropy model to enhance the encoding efficiency of all layers. During decoding, the features are processed by the corresponding decoding networks (ReFNet-1, ReFNet-2, ReFNet-3, ReFNet-4), and the decoded residual point clouds are reconstructed through the Reconstruction Network (ReconNet). The reconstructed residual point clouds are progressively concatenated with higher-level point clouds, ultimately resulting in the final reconstructed point clouds $\hat{\bm{\mathcal{P}}}_1, \hat{\bm{\mathcal{P}}}_2, \hat{\bm{\mathcal{P}}}_3, \hat{\bm{\mathcal{P}}}_4$.
  • Figure 2: Structure of the FS module. Each $\Omega$ points in a group are processed using a Hamming window and the FFT. The coefficients whose magnitude is smaller than or equal to q% of the largest magnitude are retained, while the other coefficients are set to zero. Afterward, the IFFT is used to transform the processed coefficients back to the spatial domain. The input $\Omega$ points in the group are then mapped to the non-zero positions (as shown in the black and gray dots in the figure) of the IFFT results to obtain the high-frequency component of this group.
  • Figure 3: Three-stage sampling using the FS module. Each subplot represents the corresponding down sampled point cloud, where the points depicted in red illustrate high-frequency components. Each stage progressively reduces the low-frequency information while focusing on retaining high-frequency components, ensuring better preservation of high-frequency details during compression.
  • Figure 4: Adaptive scale feature extraction modules in the encoding network: (a) FNet-1, (b) FNet-2, (c) FNet-3, and (d) FNet-4. For smooth regions, a shallow feature extraction network, namely FNet-1, is used. For regions with more high-frequency information, a deeper feature extraction network, specifically FNet-4, is used for efficient feature learning and representation.
  • Figure 5: Octree partitioning for $\bm{\mathcal{P}}_{l,\text{res}}$ or $\bm{\mathcal{P}}_L$, which is first divided into patches containing 4096 points (except for the last patch). Each patch is then partitioned using an octree to the second-to-last level (containing 8 points per sub-node). After that, the sub-nodes are represented as sparse tensors and fed into the sparse convolution-based feature extraction network.
  • ...and 8 more figures