Table of Contents
Fetching ...

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li

TL;DR

SparseFlex introduces a sparse-voxel, differentiable isosurface representation that enables high-resolution (up to 1024^3) mesh reconstruction from rendering losses while naturally handling open surfaces. A frustum-aware sectional voxel training strategy dramatically reduces memory usage, enabling interior reconstruction with rendering supervision. A complete pipeline combining a SparseFlex VAE and a rectified flow transformer achieves state-of-the-art reconstruction accuracy and high-quality image-to-3D generation for arbitrary topology. The work advances 3D shape modeling by marrying sparse, surface-focused computation with differentiable rendering and generative modeling capabilities.

Abstract

Creating high-fidelity 3D meshes with arbitrary topology, including open surfaces and complex interiors, remains a significant challenge. Existing implicit field methods often require costly and detail-degrading watertight conversion, while other approaches struggle with high resolutions. This paper introduces SparseFlex, a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to $1024^3$ directly from rendering losses. SparseFlex combines the accuracy of Flexicubes with a sparse voxel structure, focusing computation on surface-adjacent regions and efficiently handling open surfaces. Crucially, we introduce a frustum-aware sectional voxel training strategy that activates only relevant voxels during rendering, dramatically reducing memory consumption and enabling high-resolution training. This also allows, for the first time, the reconstruction of mesh interiors using only rendering supervision. Building upon this, we demonstrate a complete shape modeling pipeline by training a variational autoencoder (VAE) and a rectified flow transformer for high-quality 3D shape generation. Our experiments show state-of-the-art reconstruction accuracy, with a ~82% reduction in Chamfer Distance and a ~88% increase in F-score compared to previous methods, and demonstrate the generation of high-resolution, detailed 3D shapes with arbitrary topology. By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

TL;DR

SparseFlex introduces a sparse-voxel, differentiable isosurface representation that enables high-resolution (up to 1024^3) mesh reconstruction from rendering losses while naturally handling open surfaces. A frustum-aware sectional voxel training strategy dramatically reduces memory usage, enabling interior reconstruction with rendering supervision. A complete pipeline combining a SparseFlex VAE and a rectified flow transformer achieves state-of-the-art reconstruction accuracy and high-quality image-to-3D generation for arbitrary topology. The work advances 3D shape modeling by marrying sparse, surface-focused computation with differentiable rendering and generative modeling capabilities.

Abstract

Creating high-fidelity 3D meshes with arbitrary topology, including open surfaces and complex interiors, remains a significant challenge. Existing implicit field methods often require costly and detail-degrading watertight conversion, while other approaches struggle with high resolutions. This paper introduces SparseFlex, a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to directly from rendering losses. SparseFlex combines the accuracy of Flexicubes with a sparse voxel structure, focusing computation on surface-adjacent regions and efficiently handling open surfaces. Crucially, we introduce a frustum-aware sectional voxel training strategy that activates only relevant voxels during rendering, dramatically reducing memory consumption and enabling high-resolution training. This also allows, for the first time, the reconstruction of mesh interiors using only rendering supervision. Building upon this, we demonstrate a complete shape modeling pipeline by training a variational autoencoder (VAE) and a rectified flow transformer for high-quality 3D shape generation. Our experiments show state-of-the-art reconstruction accuracy, with a ~82% reduction in Chamfer Distance and a ~88% increase in F-score compared to previous methods, and demonstrate the generation of high-resolution, detailed 3D shapes with arbitrary topology. By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.

Paper Structure

This paper contains 36 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: SparseFlex VAE achieves high-fidelity reconstruction and generalization from point clouds. Benefiting from a sparse-structured differentiable isosurface surface representation and an efficient frustum-aware sectional voxel training strategy, our SparseFlex VAE demonstrates the state-of-the-art performance on complex geometries (left), open surfaces (top right), and even interior structures (bottom right), facilitating the high-quality image-to-3D generation with arbitrary topology.
  • Figure 2: Overview of the SparseFlex VAE pipeline. SparseFlex VAE takes point clouds sampled from a mesh as input, voxelizes them, and aggregates their features into each voxel. A sparse transformer encoder-decoder compresses the structured feature into a more compact latent space, followed by a self-pruning upsampling for higher resolution. Finally, the structured features are decoded to SparseFlex through a linear layer. Using the frustum-aware section voxel training strategy, we can train the entire pipeline more efficiently by rendering loss.
  • Figure 3: Frustum-aware sectional voxel training. The previous mesh-based rendering training strategy (left) requires activating the entire dense grid to extract the mesh surface, even though only a few voxels are necessary during rendering. In contrast, our approach (right) adaptively activates the relevant voxels and enables the reconstruction of mesh interiors only using rendering supervision.
  • Figure 4: Qualitative comparison of VAE reconstruction between ours and other state-of-the-art baselines. Our approach demonstrate superior performance in reconstructing complex shapes, open surfaces, and even interior structures.
  • Figure 5: Qualitative comparison of VAE reconstruction quality between our method with different resolution and TRELLIS.
  • ...and 1 more figures