Table of Contents
Fetching ...

LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning

Xinran Yang, Shuichang Lai, Jiangjing Lyu, Hongjie Li, Bowen Pan, Yuanqi Li, Jie Guo, Zhengkang Zhou, Yanwen Guo

TL;DR

This work tackles the challenge of high-fidelity 3D content with arbitrary topology by adopting Unsigned Distance Fields (UDFs) and introducing a Local-to-Global Variational Autoencoder (LoG-VAE). The model partitions high-resolution UDFs into uniform UBlocks and combines local 3D convolutions with global sparse transformers, aided by a Pad-Average reassembly, enabling scalable reconstruction up to $2048^3$ and mesh extraction via Marching Cubes. Core contributions include the UBlock partitioning to decouple resolution from model size, the hybrid LoG-VAE architecture for preserving local detail and global coherence, and state-of-the-art results in reconstruction and conditional generation, including non-watertight and complex geometries. This approach significantly advances practical ultra-high-resolution 3D content creation and robust 3D generative modeling for applications in design, visualization, and robotics, while outlining paths to address textures and computational efficiency in future work.

Abstract

Generating high-fidelity 3D contents remains a fundamental challenge due to the complexity of representing arbitrary topologies-such as open surfaces and intricate internal structures-while preserving geometric details. Prevailing methods based on signed distance fields (SDFs) are hampered by costly watertight preprocessing and struggle with non-manifold geometries, while point-cloud representations often suffer from sampling artifacts and surface discontinuities. To overcome these limitations, we propose a novel 3D variational autoencoder (VAE) framework built upon unsigned distance fields (UDFs)-a more robust and computationally efficient representation that naturally handles complex and incomplete shapes. Our core innovation is a local-to-global (LoG) architecture that processes the UDF by partitioning it into uniform subvolumes, termed UBlocks. This architecture couples 3D convolutions for capturing local detail with sparse transformers for enforcing global coherence. A Pad-Average strategy further ensures smooth transitions at subvolume boundaries during reconstruction. This modular design enables seamless scaling to ultra-high resolutions up to $2048^3$-a regime previously unattainable for 3D VAEs. Experiments demonstrate state-of-the-art performance in both reconstruction accuracy and generative quality, yielding superior surface smoothness and geometric flexibility.

LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning

TL;DR

This work tackles the challenge of high-fidelity 3D content with arbitrary topology by adopting Unsigned Distance Fields (UDFs) and introducing a Local-to-Global Variational Autoencoder (LoG-VAE). The model partitions high-resolution UDFs into uniform UBlocks and combines local 3D convolutions with global sparse transformers, aided by a Pad-Average reassembly, enabling scalable reconstruction up to and mesh extraction via Marching Cubes. Core contributions include the UBlock partitioning to decouple resolution from model size, the hybrid LoG-VAE architecture for preserving local detail and global coherence, and state-of-the-art results in reconstruction and conditional generation, including non-watertight and complex geometries. This approach significantly advances practical ultra-high-resolution 3D content creation and robust 3D generative modeling for applications in design, visualization, and robotics, while outlining paths to address textures and computational efficiency in future work.

Abstract

Generating high-fidelity 3D contents remains a fundamental challenge due to the complexity of representing arbitrary topologies-such as open surfaces and intricate internal structures-while preserving geometric details. Prevailing methods based on signed distance fields (SDFs) are hampered by costly watertight preprocessing and struggle with non-manifold geometries, while point-cloud representations often suffer from sampling artifacts and surface discontinuities. To overcome these limitations, we propose a novel 3D variational autoencoder (VAE) framework built upon unsigned distance fields (UDFs)-a more robust and computationally efficient representation that naturally handles complex and incomplete shapes. Our core innovation is a local-to-global (LoG) architecture that processes the UDF by partitioning it into uniform subvolumes, termed UBlocks. This architecture couples 3D convolutions for capturing local detail with sparse transformers for enforcing global coherence. A Pad-Average strategy further ensures smooth transitions at subvolume boundaries during reconstruction. This modular design enables seamless scaling to ultra-high resolutions up to -a regime previously unattainable for 3D VAEs. Experiments demonstrate state-of-the-art performance in both reconstruction accuracy and generative quality, yielding superior surface smoothness and geometric flexibility.

Paper Structure

This paper contains 17 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Pipeline of the LoG-VAE Framework. An input UDF is partitioned into sparse UBlocks, which are encoded into latent codes by a local-global encoder (3D Conv and sparse transformers). A symmetric decoder reconstructs the UBlocks, which are then reassembled into a full UDF field for final mesh extraction. The entire network is trained under the supervision of the UDF loss.
  • Figure 2: Illustration for our Pad-Average strategy.
  • Figure 3: Qualitative comparison of VAE reconstruction. Our approach demonstrates superior performance in reconstructing complex shapes, open surfaces, and even interior structures. Best viewed with zoom-in.
  • Figure 4: Qualitative comparison of simgle-image-to-3D generation. When comparing reconstructions at a $1024^3$ resolution, the generator trained with our LoG-VAE yields more detailed reconstructions than Direct3D-S2 wu2025direct3d and Sparc3D li2025sparc3d.
  • Figure 5: Qualitative comparison of ablated models.