Hint: hierarchical inter-frame correlation for one-shot point cloud sequence compression
Yuchen Gao, Qi Zhang
TL;DR
Problem: efficient, low-latency lossless compression of sequential point clouds. Approach: HINT uses a hierarchical sparse representation and a temporal-spatial entropy model that fuses a two-stage temporal feature extraction (parent-level existence map and child-level neighborhood lookup) with parity-based sibling conditioned entropy coding, predicting occupancy distributions for groups of voxels. Contributions: multi-level sparse pyramid $(\mathcal{C}_d,\mathcal{O}_d)$, coarse-to-fine temporal cues, group-wise context, and strict causality that preserves parallelizable decoding. Results: on 8iVFBv2, achieves encoding/decoding times of 105 ms/140 ms and up to 43.6% bitrate reduction versus G-PCC, with robust gains over spatial baselines. Impact: enables practical, GPU-friendly dynamic point-cloud compression for streaming and storage without heavy motion estimation.
Abstract
Deep learning has demonstrated strong capability in compressing point clouds. Within this area, entropy modeling for lossless compression is widely investigated. However, most methods rely solely on parent/sibling contexts and level-wise autoregression, which suffers from decoding latency on the order of 10^1-10^2 seconds. We propose HINT, a method that integrates temporal and spatial correlation for sequential point cloud compression. Specifically, it first uses a two-stage temporal feature extraction: (i) a parent-level existence map and (ii) a child-level neighborhood lookup in the previous frame. These cues are fused with the spatial features via element-wise addition and encoded with a group-wise strategy. Experimental results show that HINT achieves encoding and decoding time at 105 ms and 140 ms, respectively, equivalent to 49.6x and 21.6x acceleration in comparison with G-PCC, while achieving up to 43.6% bitrate reduction and consistently outperforming the spatial-only baseline (RENO).
