Table of Contents
Fetching ...

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding, M. Salman Asif, Zhan Ma

TL;DR

RENO tackles the real-time LPCC bottleneck by eliminating time-consuming octree generation and multi-stage upsampling, replacing them with multiscale sparse tensors and sparse occupancy codes. A cross-scale context model (TOP) predicts 8-bit occupancy codes in a one-shot, scale-wise fashion, while fast converters (FOG/FCG) enable parallel, low-latency code generation and reconstruction. The bitwise two-stage coding further reduces arithmetic-coding latency, enabling ~10–20 fps on standard GPUs for 12–14 bit LiDAR frames, with BD-BR gains over G-PCCv23 and Draco and competitive downstream task performance. The approach yields a compact 1 MB model that still outperforms existing real-time compressors in rate-distortion and preserves geometry well enough for effective 3D object detection, making it attractive for on-device or vehicle-to-vehicle LiDAR data sharing. Collectively, RENO demonstrates that carefully designed cross-scale occupancy coding and efficient sparse-tensor processing can achieve real-time neural LPCC without sacrificing key reconstruction and downstream task capabilities.

Abstract

Despite the substantial advancements demonstrated by learning-based neural models in the LiDAR Point Cloud Compression (LPCC) task, realizing real-time compression - an indispensable criterion for numerous industrial applications - remains a formidable challenge. This paper proposes RENO, the first real-time neural codec for 3D LiDAR point clouds, achieving superior performance with a lightweight model. RENO skips the octree construction and directly builds upon the multiscale sparse tensor representation. Instead of the multi-stage inferring, RENO devises sparse occupancy codes, which exploit cross-scale correlation and derive voxels' occupancy in a one-shot manner, greatly saving processing time. Experimental results demonstrate that the proposed RENO achieves real-time coding speed, 10 fps at 14-bit depth on a desktop platform (e.g., one RTX 3090 GPU) for both encoding and decoding processes, while providing 12.25% and 48.34% bit-rate savings compared to G-PCCv23 and Draco, respectively, at a similar quality. RENO model size is merely 1MB, making it attractive for practical applications. The source code is available at https://github.com/NJUVISION/RENO.

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

TL;DR

RENO tackles the real-time LPCC bottleneck by eliminating time-consuming octree generation and multi-stage upsampling, replacing them with multiscale sparse tensors and sparse occupancy codes. A cross-scale context model (TOP) predicts 8-bit occupancy codes in a one-shot, scale-wise fashion, while fast converters (FOG/FCG) enable parallel, low-latency code generation and reconstruction. The bitwise two-stage coding further reduces arithmetic-coding latency, enabling ~10–20 fps on standard GPUs for 12–14 bit LiDAR frames, with BD-BR gains over G-PCCv23 and Draco and competitive downstream task performance. The approach yields a compact 1 MB model that still outperforms existing real-time compressors in rate-distortion and preserves geometry well enough for effective 3D object detection, making it attractive for on-device or vehicle-to-vehicle LiDAR data sharing. Collectively, RENO demonstrates that carefully designed cross-scale occupancy coding and efficient sparse-tensor processing can achieve real-time neural LPCC without sacrificing key reconstruction and downstream task capabilities.

Abstract

Despite the substantial advancements demonstrated by learning-based neural models in the LiDAR Point Cloud Compression (LPCC) task, realizing real-time compression - an indispensable criterion for numerous industrial applications - remains a formidable challenge. This paper proposes RENO, the first real-time neural codec for 3D LiDAR point clouds, achieving superior performance with a lightweight model. RENO skips the octree construction and directly builds upon the multiscale sparse tensor representation. Instead of the multi-stage inferring, RENO devises sparse occupancy codes, which exploit cross-scale correlation and derive voxels' occupancy in a one-shot manner, greatly saving processing time. Experimental results demonstrate that the proposed RENO achieves real-time coding speed, 10 fps at 14-bit depth on a desktop platform (e.g., one RTX 3090 GPU) for both encoding and decoding processes, while providing 12.25% and 48.34% bit-rate savings compared to G-PCCv23 and Draco, respectively, at a similar quality. RENO model size is merely 1MB, making it attractive for practical applications. The source code is available at https://github.com/NJUVISION/RENO.

Paper Structure

This paper contains 27 sections, 14 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Left: Rate distortion performance comparison on the KITTI dataset. Right: Encoding time comparison for 14 bit ($\pm$ 8 mm precision) LiDAR scan, where RENO operates 10 $\times$ faster than the latest G-PCCv23 standard, achieving a runtime of 10 frames per second. Notably, the encoding time encompasses all durations including preprocessing, network inference, and arithmetic coding. Our decoding time is comparable to the encoding.
  • Figure 2: Comparison of learned LPCC pipelines. The refinement of point clouds from depth (or scale) $d$ (with $N_d$ voxels) to depth $d+1$ (with $N_{d+1}$ voxels) is employed as a toy example for better illustration. (a) Current sparse tensor-based methods predominantly employ a multi-stage inferring pipeline. However, the involved upsampling operation introduces $8 \times N_d$ voxels for neural network-based inference, leading to significant computational costs. (b) The octree-based method leverages the tree representation to obtain extensive contextual information but requires a time-intensive process for multi-level octree generation. (c) RENO introduces sparse occupancy codes to avoid multi-level tree generation and facilitate high-speed one-shot inferring, delivering real-time compression.
  • Figure 3: RENO. We identify that the real-time bottleneck of current neural codecs resides in two stages: preprocessing and neural inference. To surmount these obstacles, this approach endeavors to optimize efficiency by (i) minimizing preprocessing delays through the efficient acquisition of occupancy codes directly within sparse space via the developed Fast Occupancy Generator (FOG) and Fast Coordinate Generator (FCG); (ii) optimizing neural inference by efficiently embedding features from sparse occupancy codes to the next-level target positions, which prompted the development of the Target Occupancy Predictor (TOP).
  • Figure 4: Illustrative implementation for Fast Occupancy Generator (FOG) and Fast Coordinate Generator (FCG).
  • Figure 5: Rate-distortion performance comparison on KITTI (the first row) and Ford (the second row).
  • ...and 7 more figures