Table of Contents
Fetching ...

Implicit Neural Compression of Point Clouds

Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Zhaoyang Zhang, Dusit Niyato

TL;DR

This work introduces NeRC$^3$, an implicit neural representation-based framework for compressing point clouds by learning two neural fields: one for voxel occupancy and one for voxel attributes. It extends to dynamic scenes with i-NeRC$^3$, r-NeRC$^3$, c-NeRC$^3$, and 4D-NeRC$^3$, including a 4D spatio-temporal INR that jointly encodes multiple frames. The approach achieves state-of-the-art or competitive rate-distortion performance against traditional MPEG PCC standards and prior INR-based PCC methods, particularly excelling in dynamic geometry compression and joint geometry-attribute tasks. Despite higher encoding complexity due to per-instance network training, the methods demonstrate strong qualitative reconstructions and scalable mechanisms to manage temporal redundancy, with practical variants that balance performance and speed. This work opens a new direction for INR-based PCC by integrating geometry and attributes in neural space and by addressing temporal redundancy directly in 4D representations.

Abstract

Point clouds have gained prominence across numerous applications due to their ability to accurately represent 3D objects and scenes. However, efficiently compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we propose NeRC$^3$, a novel point cloud compression framework that leverages implicit neural representations (INRs) to encode both geometry and attributes of dense point clouds. Our approach employs two coordinate-based neural networks: one maps spatial coordinates to voxel occupancy, while the other maps occupied voxels to their attributes, thereby implicitly representing the geometry and attributes of a voxelized point cloud. The encoder quantizes and compresses network parameters alongside auxiliary information required for reconstruction, while the decoder reconstructs the original point cloud by inputting voxel coordinates into the neural networks. Furthermore, we extend our method to dynamic point cloud compression through techniques that reduce temporal redundancy, including a 4D spatio-temporal representation termed 4D-NeRC$^3$. Experimental results validate the effectiveness of our approach: For static point clouds, NeRC$^3$ outperforms octree-based G-PCC standard and existing INR-based methods. For dynamic point clouds, 4D-NeRC$^3$ achieves superior geometry compression performance compared to the latest G-PCC and V-PCC standards, while matching state-of-the-art learning-based methods. It also demonstrates competitive performance in joint geometry and attribute compression.

Implicit Neural Compression of Point Clouds

TL;DR

This work introduces NeRC, an implicit neural representation-based framework for compressing point clouds by learning two neural fields: one for voxel occupancy and one for voxel attributes. It extends to dynamic scenes with i-NeRC, r-NeRC, c-NeRC, and 4D-NeRC, including a 4D spatio-temporal INR that jointly encodes multiple frames. The approach achieves state-of-the-art or competitive rate-distortion performance against traditional MPEG PCC standards and prior INR-based PCC methods, particularly excelling in dynamic geometry compression and joint geometry-attribute tasks. Despite higher encoding complexity due to per-instance network training, the methods demonstrate strong qualitative reconstructions and scalable mechanisms to manage temporal redundancy, with practical variants that balance performance and speed. This work opens a new direction for INR-based PCC by integrating geometry and attributes in neural space and by addressing temporal redundancy directly in 4D representations.

Abstract

Point clouds have gained prominence across numerous applications due to their ability to accurately represent 3D objects and scenes. However, efficiently compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we propose NeRC, a novel point cloud compression framework that leverages implicit neural representations (INRs) to encode both geometry and attributes of dense point clouds. Our approach employs two coordinate-based neural networks: one maps spatial coordinates to voxel occupancy, while the other maps occupied voxels to their attributes, thereby implicitly representing the geometry and attributes of a voxelized point cloud. The encoder quantizes and compresses network parameters alongside auxiliary information required for reconstruction, while the decoder reconstructs the original point cloud by inputting voxel coordinates into the neural networks. Furthermore, we extend our method to dynamic point cloud compression through techniques that reduce temporal redundancy, including a 4D spatio-temporal representation termed 4D-NeRC. Experimental results validate the effectiveness of our approach: For static point clouds, NeRC outperforms octree-based G-PCC standard and existing INR-based methods. For dynamic point clouds, 4D-NeRC achieves superior geometry compression performance compared to the latest G-PCC and V-PCC standards, while matching state-of-the-art learning-based methods. It also demonstrates competitive performance in joint geometry and attribute compression.

Paper Structure

This paper contains 38 sections, 2 theorems, 21 equations, 11 figures, 6 tables.

Key Result

Proposition 1

There exists a threshold boundary $\tau_{\max}\in(0,1)$ such that the following holds:

Figures (11)

  • Figure 1: (a) A general diagram of point cloud codecs, where attribute compression and decompression is conditioned on the reconstructed geometry $\widehat{\mathcal{X}}$. (b) Our NeRC$^3$ framework. The encoder optimizes two neural networks $F$ and $G$ to implicitly represent the geometry and attributes, respectively, followed by quantization and encoding of network parameters $\mathbf{\Theta}$ and $\mathbf{\Phi}$. These parameters, along with some auxiliary information (i.e., non-empty cubes $\mathcal{W}$ and the threshold $\tau$), are transmitted to the decoder. The decoder reconstructs a lossy version of the original point cloud using the quantized network parameters and auxiliary information. "E" and "D" denote lossless encoding and decoding, respectively. The orange arrow forms a loop for geometry compression, aiming to find the optimal threshold, which will be discussed in Section \ref{['sec:threshold']}.
  • Figure 2: Pre-processing of point cloud geometry. (a) The entire volumetric space, where most regions are empty. (b) Non-empty cubes containing occupied voxels from the original point cloud. (c) Visualization of each non-empty cube.
  • Figure 3: (a) Network structure comprising multiple residual blocks. (b) Detailed structure of each residual block. The plus sign "$+$" denotes residual connection. (c) Separate positional encoding for the spatial and temporal coordinates. "C" denotes concatenation. Please refer to Section \ref{['sec:4dpc']} for details.
  • Figure 4: (a) In theory, $D(\tau)$ is piece-wise constant and right-continuous on $[0,\tau_{\max})$, and is unimodal with its peak at $\tau^*$. (b) In practice, $D(\tau)$ appears like a smooth continuous function. (c) Ideally, voxels with higher OPs should be closer to $\mathcal{X}$. (d) If OPs fluctuate in empty regions, voxels with higher OPs can be farther from $\mathcal{X}$.
  • Figure 5: Diagrams of three extended methods to reduce temporal redundancy. Both (a) r-NeRC$^3$ and (b) c-NeRC$^3$ first transform the geometry/attributes of point clouds into neural space via INR training, with details described in Section \ref{['sec:III']}. In (a) r-NeRC$^3$, the network parameters in neural space are encoded as residual w.r.t. the quantized parameters of the previous frame. "Q" denotes quantization. In (b) c-NeRC$^3$, the parameter sets of networks are assumed to lie on a simple curve in neural space. Only the essential control "points" defining this curve are encoded. (c) 4D-NeRC$^3$ directly eliminates redundancy within the point cloud space by constructing a 4D INR capable of processing multiple frames simultaneously.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2