Table of Contents
Fetching ...

LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

Yuqi Ye, Wei Gao

TL;DR

The paper tackles lossless point cloud geometry compression by replacing the traditional context model with a large language model (LLM). It proposes LLM-PCGC, which fine-tunes a pre-trained LLM with LoRA and employs clustering, K-tree, and token mapping invariance to enable cross-modal compression without text data, acting as a point cloud compressor. The approach achieves substantial bitrate reductions, outperforming MPEG G-PCC by about 40% and surpassing the state-of-the-art SparsePCGC by about 2%, demonstrating the viability of LLMs for 3D data compression. This work opens a new direction for cross-modal compression, though it notes challenges such as memory consumption and inference time, suggesting avenues for optimization and broader applicability.

Abstract

The key to effective point cloud compression is to obtain a robust context model consistent with complex 3D data structures. Recently, the advancement of large language models (LLMs) has highlighted their capabilities not only as powerful generators for in-context learning and generation but also as effective compressors. These dual attributes of LLMs make them particularly well-suited to meet the demands of data compression. Therefore, this paper explores the potential of using LLM for compression tasks, focusing on lossless point cloud geometry compression (PCGC) experiments. However, applying LLM directly to PCGC tasks presents some significant challenges, i.e., LLM does not understand the structure of the point cloud well, and it is a difficult task to fill the gap between text and point cloud through text description, especially for large complicated and small shapeless point clouds. To address these problems, we introduce a novel architecture, namely the Large Language Model-based Point Cloud Geometry Compression (LLM-PCGC) method, using LLM to compress point cloud geometry information without any text description or aligning operation. By utilizing different adaptation techniques for cross-modality representation alignment and semantic consistency, including clustering, K-tree, token mapping invariance, and Low Rank Adaptation (LoRA), the proposed method can translate LLM to a compressor/generator for point cloud. To the best of our knowledge, this is the first structure to employ LLM as a compressor for point cloud data. Experiments demonstrate that the LLM-PCGC outperforms the other existing methods significantly, by achieving -40.213% bit rate reduction compared to the reference software of MPEG Geometry-based Point Cloud Compression (G-PCC) standard, and by achieving -2.267% bit rate reduction compared to the state-of-the-art learning-based method.

LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

TL;DR

The paper tackles lossless point cloud geometry compression by replacing the traditional context model with a large language model (LLM). It proposes LLM-PCGC, which fine-tunes a pre-trained LLM with LoRA and employs clustering, K-tree, and token mapping invariance to enable cross-modal compression without text data, acting as a point cloud compressor. The approach achieves substantial bitrate reductions, outperforming MPEG G-PCC by about 40% and surpassing the state-of-the-art SparsePCGC by about 2%, demonstrating the viability of LLMs for 3D data compression. This work opens a new direction for cross-modal compression, though it notes challenges such as memory consumption and inference time, suggesting avenues for optimization and broader applicability.

Abstract

The key to effective point cloud compression is to obtain a robust context model consistent with complex 3D data structures. Recently, the advancement of large language models (LLMs) has highlighted their capabilities not only as powerful generators for in-context learning and generation but also as effective compressors. These dual attributes of LLMs make them particularly well-suited to meet the demands of data compression. Therefore, this paper explores the potential of using LLM for compression tasks, focusing on lossless point cloud geometry compression (PCGC) experiments. However, applying LLM directly to PCGC tasks presents some significant challenges, i.e., LLM does not understand the structure of the point cloud well, and it is a difficult task to fill the gap between text and point cloud through text description, especially for large complicated and small shapeless point clouds. To address these problems, we introduce a novel architecture, namely the Large Language Model-based Point Cloud Geometry Compression (LLM-PCGC) method, using LLM to compress point cloud geometry information without any text description or aligning operation. By utilizing different adaptation techniques for cross-modality representation alignment and semantic consistency, including clustering, K-tree, token mapping invariance, and Low Rank Adaptation (LoRA), the proposed method can translate LLM to a compressor/generator for point cloud. To the best of our knowledge, this is the first structure to employ LLM as a compressor for point cloud data. Experiments demonstrate that the LLM-PCGC outperforms the other existing methods significantly, by achieving -40.213% bit rate reduction compared to the reference software of MPEG Geometry-based Point Cloud Compression (G-PCC) standard, and by achieving -2.267% bit rate reduction compared to the state-of-the-art learning-based method.
Paper Structure (10 sections, 4 figures, 2 tables)

This paper contains 10 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison of training schemes between the proposed LLM-PCGC method and other learning-based methods for point cloud geometry compression. Different from existing methods adopting the end-to-end training manner, our method implements a point cloud compressor by fine-tuning a pre-trained text generator LLM to achieve efficient cross-modality representation alignment.
  • Figure 2: LLM-PCGC encoding pipline. Given a 3D point cloud, the encoding pipeline starts with clustering, followed by normalization and K-Tree structuring. It then employs token mapping invariance for token conversion. Subsequently, a trained LoRA model with a frozen LLM is used to compute the probability distribution for the next token. These distribution are then fed into an arithmetic encoder, resulting in the generation of the encoded bitstream.
  • Figure 3: LLM-PCGC decoding pipline. In decoding, binary bits are split, converted to decimals, and the main bitstream is processed in parallel. Through arithmetic decoder, bitstream is decoded by probabilities using LoRA and LLM, and then further mapped into point cloud patches. These patches are aligned and merged by offsets and indices. In the final decoding phase, big patches are structured into a K-Tree for clustered point cloud reconstruction. In the final post-reconstruction, offsets are applied to rebuild the original point cloud.
  • Figure 4: Comparison of bpp among autoregressive methods and traditional method G-PCC.