LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning
Xinran Yang, Shuichang Lai, Jiangjing Lyu, Hongjie Li, Bowen Pan, Yuanqi Li, Jie Guo, Zhengkang Zhou, Yanwen Guo
TL;DR
This work tackles the challenge of high-fidelity 3D content with arbitrary topology by adopting Unsigned Distance Fields (UDFs) and introducing a Local-to-Global Variational Autoencoder (LoG-VAE). The model partitions high-resolution UDFs into uniform UBlocks and combines local 3D convolutions with global sparse transformers, aided by a Pad-Average reassembly, enabling scalable reconstruction up to $2048^3$ and mesh extraction via Marching Cubes. Core contributions include the UBlock partitioning to decouple resolution from model size, the hybrid LoG-VAE architecture for preserving local detail and global coherence, and state-of-the-art results in reconstruction and conditional generation, including non-watertight and complex geometries. This approach significantly advances practical ultra-high-resolution 3D content creation and robust 3D generative modeling for applications in design, visualization, and robotics, while outlining paths to address textures and computational efficiency in future work.
Abstract
Generating high-fidelity 3D contents remains a fundamental challenge due to the complexity of representing arbitrary topologies-such as open surfaces and intricate internal structures-while preserving geometric details. Prevailing methods based on signed distance fields (SDFs) are hampered by costly watertight preprocessing and struggle with non-manifold geometries, while point-cloud representations often suffer from sampling artifacts and surface discontinuities. To overcome these limitations, we propose a novel 3D variational autoencoder (VAE) framework built upon unsigned distance fields (UDFs)-a more robust and computationally efficient representation that naturally handles complex and incomplete shapes. Our core innovation is a local-to-global (LoG) architecture that processes the UDF by partitioning it into uniform subvolumes, termed UBlocks. This architecture couples 3D convolutions for capturing local detail with sparse transformers for enforcing global coherence. A Pad-Average strategy further ensures smooth transitions at subvolume boundaries during reconstruction. This modular design enables seamless scaling to ultra-high resolutions up to $2048^3$-a regime previously unattainable for 3D VAEs. Experiments demonstrate state-of-the-art performance in both reconstruction accuracy and generative quality, yielding superior surface smoothness and geometric flexibility.
