Table of Contents
Fetching ...

SCP: Spherical-Coordinate-based Learned Point Cloud Compression

Ao Luo, Linxin Song, Keisuke Nonaka, Kyohei Unno, Heming Sun, Masayuki Goto, Jiro Katto

TL;DR

This work tackles the high cost of storing and transmitting LiDAR point clouds by introducing SCP, a model-agnostic preprocessing that converts Cartesian coordinates to Spherical coordinates to exploit the circular chains and azimuthal invariance in spinning LiDAR data. SCP is complemented by a multi-level Octree that reduces reconstruction errors in distant regions, improving consistency with Cartesian baselines. The approach is validated across two backbone learned compression methods on SemanticKITTI and Ford datasets, delivering up to 29.14% BD-Rate gains in point-to-point PSNR and demonstrating strong cross-method universality. The results suggest SCP can significantly enhance practical point cloud compression in autonomous driving scenarios while remaining adaptable to existing backbones.

Abstract

In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to leverage the aforementioned features fully. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.

SCP: Spherical-Coordinate-based Learned Point Cloud Compression

TL;DR

This work tackles the high cost of storing and transmitting LiDAR point clouds by introducing SCP, a model-agnostic preprocessing that converts Cartesian coordinates to Spherical coordinates to exploit the circular chains and azimuthal invariance in spinning LiDAR data. SCP is complemented by a multi-level Octree that reduces reconstruction errors in distant regions, improving consistency with Cartesian baselines. The approach is validated across two backbone learned compression methods on SemanticKITTI and Ford datasets, delivering up to 29.14% BD-Rate gains in point-to-point PSNR and demonstrating strong cross-method universality. The results suggest SCP can significantly enhance practical point cloud compression in autonomous driving scenarios while remaining adaptable to existing backbones.

Abstract

In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to leverage the aforementioned features fully. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.
Paper Structure (25 sections, 9 equations, 7 figures, 1 table)

This paper contains 25 sections, 9 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Comparison of the Octree structures in Cartesian, Cylindrical, and Spherical coordinates. The points with the same color are in the same voxel. When looking up-down, the Octree structure in Cylindrical coordinates (see Fig. \ref{['fig:cylin_split']}-left) can better fit the LiDAR point clouds than Cartesian coordinates because it harnesses the circular shapes of LiDAR point clouds. In this up-down view, the Octree structures of Cylindrical and Spherical coordinates are similar. However, when looking from the original point horizontally, as the black lines in Fig. \ref{['fig:cylin_split']}-right and Fig. \ref{['fig:spher_split']}, the Octree in Cylindrical coordinates overlooks the azimuthal angle invariance, often splitting chains into different voxels. In contrast, the Octree in Spherical coordinates tends to group points from the same chain into the same voxel, increasing the relevant information for every point.
  • Figure 2: In Spherical coordinates, the size of the voxels varies based on their distance from the origin (left-bottom corner). We distinguish voxels by different colors. The sizes of distant voxels are larger than those of the central ones. This phenomenon results in lower reconstruction errors for the central voxels and higher errors for the distant ones.
  • Figure 3: The projections of point positions in Spherical coordinates onto the $\rho o\theta$-plane. The left parts, which represent circular shapes in the point cloud, are nearly straight lines, which is simpler for the compression of points.
  • Figure 4: Quantitative rate-distortion results of our SCP-EHEM and SCP-OctAttn models on SemanticKITTI (top) and Ford (bottom) datasets. The baselines are EHEM ehem, SparsePCGC nju, OctAttention octattention and G-PCC TMC13 gpcc with either Octree or predictive geometry. Predictive geometry is only available for the Ford dataset, as the SemanticKITTI dataset lacks the necessary sensor information for the calculation of predictive geometry.
  • Figure 5: The figures compare the reconstruction errors on point clouds No. 000000 in Sequence 11 (a, b, c) and No. 000000 in Sequence 16 (d, e, f) from SemanticKITTI kitti, among the baseline EHEM ehem method in (a, d), our SCP-EHEM method in (b, e), and the SCP-EHEM without a multi-level Octree in (c, f). The metrics are D1 PSNR @ bpp. The error colormap is displayed below the figures, where purple indicates the lowest error, and red represents the highest error. It is evident that the central parts of the point clouds in (b, e) have significantly lower reconstruction errors than the baseline method shown in (a, d). Conversely, the distant parts in (c, f) have much higher reconstruction errors than the central parts, but this is mitigated by implementing the multi-level Octree as depicted in (b, e).
  • ...and 2 more figures