Table of Contents
Fetching ...

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng, Yihao Luo, Bihan Wen

TL;DR

The paper tackles the bottleneck of high-fidelity 3D generation by separating topology-preserving remeshing from modality-consistent latent encoding. It introduces Sparcubes, a sparse deformable marching cubes representation that converts raw, non-watertight meshes into watertight surfaces at 1024^3 resolution with substantial speedups. It also presents Sparconv-VAE, a sparse-convolutional variational autoencoder with a self-pruning decoder that eliminates the input-output modality gap and enables efficient, near-lossless reconstruction. When integrated with latent diffusion models (e.g., TRELLIS), Sparc3D achieves state-of-the-art reconstruction fidelity and resolution for open surfaces, disconnected components, and intricate geometries, while reducing training costs. The work provides a scalable, topology-preserving foundation for high-fidelity 3D asset generation applicable to AR/VR, robotics, and high-detail 3D printing.

Abstract

High-fidelity 3D object synthesis remains significantly more challenging than 2D image generation due to the unstructured nature of mesh data and the cubic complexity of dense volumetric grids. Existing two-stage pipelines-compressing meshes with a VAE (using either 2D or 3D supervision), followed by latent diffusion sampling-often suffer from severe detail loss caused by inefficient representations and modality mismatches introduced in VAE. We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-resolution ($1024^3$) surfaces with arbitrary topology by scattering signed distance and deformation fields onto a sparse cube, allowing differentiable optimization. Sparconv-VAE is the first modality-consistent variational autoencoder built entirely upon sparse convolutional networks, enabling efficient and near-lossless 3D reconstruction suitable for high-resolution generative modeling through latent diffusion. Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry. It preserves fine-grained shape details, reduces training and inference cost, and integrates naturally with latent diffusion models for scalable, high-resolution 3D generation.

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

TL;DR

The paper tackles the bottleneck of high-fidelity 3D generation by separating topology-preserving remeshing from modality-consistent latent encoding. It introduces Sparcubes, a sparse deformable marching cubes representation that converts raw, non-watertight meshes into watertight surfaces at 1024^3 resolution with substantial speedups. It also presents Sparconv-VAE, a sparse-convolutional variational autoencoder with a self-pruning decoder that eliminates the input-output modality gap and enables efficient, near-lossless reconstruction. When integrated with latent diffusion models (e.g., TRELLIS), Sparc3D achieves state-of-the-art reconstruction fidelity and resolution for open surfaces, disconnected components, and intricate geometries, while reducing training costs. The work provides a scalable, topology-preserving foundation for high-fidelity 3D asset generation applicable to AR/VR, robotics, and high-detail 3D printing.

Abstract

High-fidelity 3D object synthesis remains significantly more challenging than 2D image generation due to the unstructured nature of mesh data and the cubic complexity of dense volumetric grids. Existing two-stage pipelines-compressing meshes with a VAE (using either 2D or 3D supervision), followed by latent diffusion sampling-often suffer from severe detail loss caused by inefficient representations and modality mismatches introduced in VAE. We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-resolution () surfaces with arbitrary topology by scattering signed distance and deformation fields onto a sparse cube, allowing differentiable optimization. Sparconv-VAE is the first modality-consistent variational autoencoder built entirely upon sparse convolutional networks, enabling efficient and near-lossless 3D reconstruction suitable for high-resolution generative modeling through latent diffusion. Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry. It preserves fine-grained shape details, reduces training and inference cost, and integrates naturally with latent diffusion models for scalable, high-resolution 3D generation.

Paper Structure

This paper contains 17 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Sparc3D Reconstruction Results. Leveraging our sparse deformable marching cubes (Sparcubes) representation and sparse convolutional VAE (Sparconv-VAE), our method achieves state-of-the-art reconstruction quality on challenging 3D inputs. It robustly handles open surfaces (automatically closed into watertight meshes), recovers hidden interior structures, and faithfully reconstructs highly complex geometries (see zoom-in views, top to bottom). All outputs are fully watertight and 3D-printable, demonstrating the potential of our framework for high-resolution 3D mesh generation. Best viewed with zoom-in.
  • Figure 2: Problems of the previous SDFs extraction pipeline. The widely used SDFs extraction workflow dorahunyuan2craftsman suffers from two critical failures: resolution degradation (show as error) and missing geometry (circled on the right). Converting UDF to SDF by subtracting two voxel sizes effectively halves the spatial resolution. Moreover, the SDF extraction yields a double-layer mesh, from which only the largest connected component is retained, inadvertently discarding smaller but important component. Together, these two deficiencies substantially limit the upper-bound performance of downstream VAEs and generation models. Best viewed with zoom-in.
  • Figure 2: Quantitative comparison of VAE reconstruction across the ABO collins2022abo, Objaverse objaverse, and In-the-Wild datasets. Chamfer Distance (CD, $\times 10^4$), Absolute Normal Consistency (ANC, $\times 10^2$) and F1 score (F1, $\times 10^2$) are reported.
  • Figure 3: Illustration of our Sparcubes reconstruction pipeline for converting a raw mesh into a watertight mesh.
  • Figure 4: Qualitative comparison of watertight remeshing pipelines. We evaluate our Sparcubes remeshing pipeline against previous widely used one dorahunyuan2craftsman, i.e., Dora-wt dora, at voxel resolutions of 512 and 1024. Compared with the previous method, our Sparcubes preserves crucial components (e.g., the car wheel) and recovers finer geometric details (e.g., the shelving frame). Our wt-512 result even outperforms the wt-1024 remeshed by Dora-wt dora. Best viewed with zoom-in.
  • ...and 2 more figures