Table of Contents
Fetching ...

FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds

Xiaoge Zhang, Zijie Wu, Mingtao Feng, Zichen Geng, Mehwish Nasim, Saeed Anwar, Ajmal Mian

TL;DR

FLaTEC addresses LiDAR point cloud compression by disentangling frequency components and adopting a triplane latent proxy to reduce 3D complexity. It introduces a spectrum-aware pipeline with stage-wise frequency decomposition (FD) and frequency modulation (FM), complemented by a local spectrum attention mechanism to preserve details while aggressively reducing bitrate. The method achieves state-of-the-art rate-distortion results, with BD-rate gains up to $94.21\%$ on 40mILEN and $78.51\%$ on SemanticKITTI, while maintaining real-time performance and robust generalization. By leveraging a triplane representation and a locality-aware refinement module, FLaTEC offers a scalable, high-fidelity solution for large-scale LiDAR compression with flexible upsampling to arbitrary resolutions.

Abstract

Point cloud compression methods jointly optimize bitrates and reconstruction distortion. However, balancing compression ratio and reconstruction quality is difficult because low-frequency and high-frequency components contribute differently at the same resolution. To address this, we propose FLaTEC, a frequency-aware compression model that enables the compression of a full scan with high compression ratios. Our approach introduces a frequency-aware mechanism that decouples low-frequency structures and high-frequency textures, while hybridizing latent triplanes as a compact proxy for point cloud. Specifically, we convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements. We then devise a frequency-disentangling technique that extracts compact low-frequency content while collecting high-frequency details across scales. The decoupled low-frequency and high-frequency components are stored in binary format. During decoding, full-spectrum signals are progressively recovered via a modulation block. Additionally, to compensate for the loss of 3D correlation, we introduce an efficient frequency-based attention mechanism that fosters local connectivity and outputs arbitrary resolution points. Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78\% and 94\% in BD-rate on both SemanticKITTI and Ford datasets.

FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds

TL;DR

FLaTEC addresses LiDAR point cloud compression by disentangling frequency components and adopting a triplane latent proxy to reduce 3D complexity. It introduces a spectrum-aware pipeline with stage-wise frequency decomposition (FD) and frequency modulation (FM), complemented by a local spectrum attention mechanism to preserve details while aggressively reducing bitrate. The method achieves state-of-the-art rate-distortion results, with BD-rate gains up to on 40mILEN and on SemanticKITTI, while maintaining real-time performance and robust generalization. By leveraging a triplane representation and a locality-aware refinement module, FLaTEC offers a scalable, high-fidelity solution for large-scale LiDAR compression with flexible upsampling to arbitrary resolutions.

Abstract

Point cloud compression methods jointly optimize bitrates and reconstruction distortion. However, balancing compression ratio and reconstruction quality is difficult because low-frequency and high-frequency components contribute differently at the same resolution. To address this, we propose FLaTEC, a frequency-aware compression model that enables the compression of a full scan with high compression ratios. Our approach introduces a frequency-aware mechanism that decouples low-frequency structures and high-frequency textures, while hybridizing latent triplanes as a compact proxy for point cloud. Specifically, we convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements. We then devise a frequency-disentangling technique that extracts compact low-frequency content while collecting high-frequency details across scales. The decoupled low-frequency and high-frequency components are stored in binary format. During decoding, full-spectrum signals are progressively recovered via a modulation block. Additionally, to compensate for the loss of 3D correlation, we introduce an efficient frequency-based attention mechanism that fosters local connectivity and outputs arbitrary resolution points. Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78\% and 94\% in BD-rate on both SemanticKITTI and Ford datasets.

Paper Structure

This paper contains 13 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Qualitative and quantitative comparison of point cloud compression methods. The proposed FLaTEC substantially reduces both the compressed file size and encoding/decoding time, while achieving comparable reconstruction quality to the baseline. The zoom-in region highlights vehicles on the road.
  • Figure 2: (a) Traditional deep learning methods encode all frequency components at the same resolution, resulting in a suboptimal trade-off for compression. (b) Our method disentangles high-frequency details from basic features, allowing for flexible bitrate allocation across different levels of detail.
  • Figure 3: Overview of our compression method. FD and FM refer to feature decomposition and frequency modulation. LSA represents local spectrum attention. HF is high frequency. Voxel features are initially projected onto three orthogonal views—top, front, and side—to reduce sparsity and storage costs. These projected triplane features are then processed through separate 2D encoders, which output global content and high-frequency priors. The encoded features are subsequently quantized and converted into a binary string. During decoding, 2D decoders reconstruct fine-grained triplane features guided by high-frequency priors. Finally, the voxel features are refined with spatial correlations before generating volumetric occupancy probability.
  • Figure 4: Module architectures. 1) The FD Block first performs frequency decomposition, then enhances global content with low-frequency structures and organizes high-frequency details into a hierarchical representation. 2) The FM Block aligns the high-frequency priors with the current-level base features, then reconstructs original details guided by the aligned priors. 3) The LSA-Enhancer refines local volumetric textures by modulating regional frequency components using a learnable attention mechanism.
  • Figure 5: Qualitative results on the 40mILEN dataset. E/D denotes encoding and decoding time (in secs). Zoom in for details.
  • ...and 4 more figures