FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds
Xiaoge Zhang, Zijie Wu, Mingtao Feng, Zichen Geng, Mehwish Nasim, Saeed Anwar, Ajmal Mian
TL;DR
FLaTEC addresses LiDAR point cloud compression by disentangling frequency components and adopting a triplane latent proxy to reduce 3D complexity. It introduces a spectrum-aware pipeline with stage-wise frequency decomposition (FD) and frequency modulation (FM), complemented by a local spectrum attention mechanism to preserve details while aggressively reducing bitrate. The method achieves state-of-the-art rate-distortion results, with BD-rate gains up to $94.21\%$ on 40mILEN and $78.51\%$ on SemanticKITTI, while maintaining real-time performance and robust generalization. By leveraging a triplane representation and a locality-aware refinement module, FLaTEC offers a scalable, high-fidelity solution for large-scale LiDAR compression with flexible upsampling to arbitrary resolutions.
Abstract
Point cloud compression methods jointly optimize bitrates and reconstruction distortion. However, balancing compression ratio and reconstruction quality is difficult because low-frequency and high-frequency components contribute differently at the same resolution. To address this, we propose FLaTEC, a frequency-aware compression model that enables the compression of a full scan with high compression ratios. Our approach introduces a frequency-aware mechanism that decouples low-frequency structures and high-frequency textures, while hybridizing latent triplanes as a compact proxy for point cloud. Specifically, we convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements. We then devise a frequency-disentangling technique that extracts compact low-frequency content while collecting high-frequency details across scales. The decoupled low-frequency and high-frequency components are stored in binary format. During decoding, full-spectrum signals are progressively recovered via a modulation block. Additionally, to compensate for the loss of 3D correlation, we introduce an efficient frequency-based attention mechanism that fosters local connectivity and outputs arbitrary resolution points. Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78\% and 94\% in BD-rate on both SemanticKITTI and Ford datasets.
