Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
Zhihao Li, Yufei Wang, Heliang Zheng, Yihao Luo, Bihan Wen
TL;DR
The paper tackles the bottleneck of high-fidelity 3D generation by separating topology-preserving remeshing from modality-consistent latent encoding. It introduces Sparcubes, a sparse deformable marching cubes representation that converts raw, non-watertight meshes into watertight surfaces at 1024^3 resolution with substantial speedups. It also presents Sparconv-VAE, a sparse-convolutional variational autoencoder with a self-pruning decoder that eliminates the input-output modality gap and enables efficient, near-lossless reconstruction. When integrated with latent diffusion models (e.g., TRELLIS), Sparc3D achieves state-of-the-art reconstruction fidelity and resolution for open surfaces, disconnected components, and intricate geometries, while reducing training costs. The work provides a scalable, topology-preserving foundation for high-fidelity 3D asset generation applicable to AR/VR, robotics, and high-detail 3D printing.
Abstract
High-fidelity 3D object synthesis remains significantly more challenging than 2D image generation due to the unstructured nature of mesh data and the cubic complexity of dense volumetric grids. Existing two-stage pipelines-compressing meshes with a VAE (using either 2D or 3D supervision), followed by latent diffusion sampling-often suffer from severe detail loss caused by inefficient representations and modality mismatches introduced in VAE. We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-resolution ($1024^3$) surfaces with arbitrary topology by scattering signed distance and deformation fields onto a sparse cube, allowing differentiable optimization. Sparconv-VAE is the first modality-consistent variational autoencoder built entirely upon sparse convolutional networks, enabling efficient and near-lossless 3D reconstruction suitable for high-resolution generative modeling through latent diffusion. Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry. It preserves fine-grained shape details, reduces training and inference cost, and integrates naturally with latent diffusion models for scalable, high-resolution 3D generation.
