Semantic Compression of 3D Objects for Open and Collaborative Virtual Worlds
Jordan Dotzel, Tony Montes, Mohamed S. Abdelfattah, Zhiru Zhang
TL;DR
This paper reframes 3D object compression by prioritizing semantic content over exact geometry, using natural language descriptions to achieve extreme data reduction. It presents a pipeline that leverages public generative models, including diffusion-based generators and image-to-3D synthesis, with optional edge-based structural cues to reconstruct meshes and textures from semantic encodings. The approach achieves very high compression, reporting up to five- or six-figure x compression on some Objaverse objects and outperforms traditional methods in the high-compression regime, though automated metrics do not always align with human judgments. The work discusses the practical implications for open, collaborative virtual worlds and emphasizes the tradeoffs between semantic and structural information, calling for future research to improve reliability and integrate both paradigms for metaverse-scale deployments.
Abstract
Traditional methods for 3D object compression operate only on structural information within the object vertices, polygons, and textures. These methods are effective at compression rates up to 10x for standard object sizes but quickly deteriorate at higher compression rates with texture artifacts, low-polygon counts, and mesh gaps. In contrast, semantic compression ignores structural information and operates directly on the core concepts to push to extreme levels of compression. In addition, it uses natural language as its storage format, which makes it natively human-readable and a natural fit for emerging applications built around large-scale, collaborative projects within augmented and virtual reality. It deprioritizes structural information like location, size, and orientation and predicts the missing information with state-of-the-art deep generative models. In this work, we construct a pipeline for 3D semantic compression from public generative models and explore the quality-compression frontier for 3D object compression. We apply this pipeline to achieve rates as high as 105x for 3D objects taken from the Objaverse dataset and show that semantic compression can outperform traditional methods in the important quality-preserving region around 100x compression.
