Table of Contents
Fetching ...

Semantic Compression of 3D Objects for Open and Collaborative Virtual Worlds

Jordan Dotzel, Tony Montes, Mohamed S. Abdelfattah, Zhiru Zhang

TL;DR

This paper reframes 3D object compression by prioritizing semantic content over exact geometry, using natural language descriptions to achieve extreme data reduction. It presents a pipeline that leverages public generative models, including diffusion-based generators and image-to-3D synthesis, with optional edge-based structural cues to reconstruct meshes and textures from semantic encodings. The approach achieves very high compression, reporting up to five- or six-figure x compression on some Objaverse objects and outperforms traditional methods in the high-compression regime, though automated metrics do not always align with human judgments. The work discusses the practical implications for open, collaborative virtual worlds and emphasizes the tradeoffs between semantic and structural information, calling for future research to improve reliability and integrate both paradigms for metaverse-scale deployments.

Abstract

Traditional methods for 3D object compression operate only on structural information within the object vertices, polygons, and textures. These methods are effective at compression rates up to 10x for standard object sizes but quickly deteriorate at higher compression rates with texture artifacts, low-polygon counts, and mesh gaps. In contrast, semantic compression ignores structural information and operates directly on the core concepts to push to extreme levels of compression. In addition, it uses natural language as its storage format, which makes it natively human-readable and a natural fit for emerging applications built around large-scale, collaborative projects within augmented and virtual reality. It deprioritizes structural information like location, size, and orientation and predicts the missing information with state-of-the-art deep generative models. In this work, we construct a pipeline for 3D semantic compression from public generative models and explore the quality-compression frontier for 3D object compression. We apply this pipeline to achieve rates as high as 105x for 3D objects taken from the Objaverse dataset and show that semantic compression can outperform traditional methods in the important quality-preserving region around 100x compression.

Semantic Compression of 3D Objects for Open and Collaborative Virtual Worlds

TL;DR

This paper reframes 3D object compression by prioritizing semantic content over exact geometry, using natural language descriptions to achieve extreme data reduction. It presents a pipeline that leverages public generative models, including diffusion-based generators and image-to-3D synthesis, with optional edge-based structural cues to reconstruct meshes and textures from semantic encodings. The approach achieves very high compression, reporting up to five- or six-figure x compression on some Objaverse objects and outperforms traditional methods in the high-compression regime, though automated metrics do not always align with human judgments. The work discusses the practical implications for open, collaborative virtual worlds and emphasizes the tradeoffs between semantic and structural information, calling for future research to improve reliability and integrate both paradigms for metaverse-scale deployments.

Abstract

Traditional methods for 3D object compression operate only on structural information within the object vertices, polygons, and textures. These methods are effective at compression rates up to 10x for standard object sizes but quickly deteriorate at higher compression rates with texture artifacts, low-polygon counts, and mesh gaps. In contrast, semantic compression ignores structural information and operates directly on the core concepts to push to extreme levels of compression. In addition, it uses natural language as its storage format, which makes it natively human-readable and a natural fit for emerging applications built around large-scale, collaborative projects within augmented and virtual reality. It deprioritizes structural information like location, size, and orientation and predicts the missing information with state-of-the-art deep generative models. In this work, we construct a pipeline for 3D semantic compression from public generative models and explore the quality-compression frontier for 3D object compression. We apply this pipeline to achieve rates as high as 105x for 3D objects taken from the Objaverse dataset and show that semantic compression can outperform traditional methods in the important quality-preserving region around 100x compression.

Paper Structure

This paper contains 22 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Semantic Compression: Extreme compression requires primarily storing semantic as opposed to structural information, such as point clouds, polygons, or frequencies. Semantic compression enables extreme ratios by preserving human-oriented semantics as opposed to structural information or its derivatives. In particular, the structured semantic compression outperforms traditional methods in the region around $10-100\times$ compression, beyond which only semantic-based methods can be applied.
  • Figure 2: Semantic Scaling: As the size of the input grows, with more objects and more complex objects (higher resolution, denser point clouds, or more polygons), semantic compression becomes more efficient. Traditional compression scales linearly with the input, while semantic compression scales sub-linearly with the semantic content. The initial memory for the compressed world model is amortized over many objects.
  • Figure 3: Semantic Factorization: Across most media formats, the structural and semantic dimensions can be factorized and compressed separately. This stores 3D point clouds or 2D projections in addition to natural language description.
  • Figure 4: 3D Objects: 3D objects are composed of files for the texture and mesh, where the mesh breaks down into vertices and polygons between them. The mesh typically dominates the size of the object.
  • Figure 5: Semantic Compression: Further semantic compression loses more and more detail with the benefit of significant memory savings. Top row includes the original with additional structural information, and the bottom row is pure semantic compression. Compression ratios are listed under each object.
  • ...and 4 more figures