Table of Contents
Fetching ...

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, Ping Tan

TL;DR

Submits a sharp-edge sampling strategy (SES) and a dual cross-attention VAE (Dora-VAE) to address loss of geometric detail in 3D shape encoding. Introduces Dora-Bench and Sharp Normal Error (SNE) to benchmark reconstruction quality at salient features. Demonstrates that Dora-VAE achieves comparable reconstruction to XCube-VAE with far smaller latent codes and yields improved downstream diffusion-based 3D generation. Ablation studies validate SES and DCA as key contributors to detail preservation across varied geometric complexity.

Abstract

Recent 3D content generation pipelines commonly employ Variational Autoencoders (VAEs) to encode shapes into compact latent representations for diffusion-based generation. However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks. We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism. By identifying and prioritizing regions with high geometric complexity during training, our method significantly improves the preservation of fine-grained shape features. Such sampling strategy and the dual attention mechanism enable the VAE to focus on crucial geometric details that are typically missed by uniform sampling approaches. To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges, introducing a new metric focused on reconstruction accuracy at these salient geometric features. Extensive experiments on the Dora-bench demonstrate that Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8$\times$ smaller (1,280 vs. > 10,000 codes).

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

TL;DR

Submits a sharp-edge sampling strategy (SES) and a dual cross-attention VAE (Dora-VAE) to address loss of geometric detail in 3D shape encoding. Introduces Dora-Bench and Sharp Normal Error (SNE) to benchmark reconstruction quality at salient features. Demonstrates that Dora-VAE achieves comparable reconstruction to XCube-VAE with far smaller latent codes and yields improved downstream diffusion-based 3D generation. Ablation studies validate SES and DCA as key contributors to detail preservation across varied geometric complexity.

Abstract

Recent 3D content generation pipelines commonly employ Variational Autoencoders (VAEs) to encode shapes into compact latent representations for diffusion-based generation. However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks. We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism. By identifying and prioritizing regions with high geometric complexity during training, our method significantly improves the preservation of fine-grained shape features. Such sampling strategy and the dual attention mechanism enable the VAE to focus on crucial geometric details that are typically missed by uniform sampling approaches. To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges, introducing a new metric focused on reconstruction accuracy at these salient geometric features. Extensive experiments on the Dora-bench demonstrate that Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8 smaller (1,280 vs. > 10,000 codes).

Paper Structure

This paper contains 22 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Sampling strategy comparison. Given the ground truth mesh shown in (a), we visualize point clouds produced by uniform sampling in (b) and those generated by our proposed Sharp Edge Sampling (SES) in (c), at various sampling rates. In (d), we compare the reconstruction accuracy trained with our SES and the uniform sampling using the F-score metric. The comparison demonstrates SES consistently outperforms uniform sampling under varying sampling rates, as the point clouds generated by SES are more effective in capturing the salient features of the object.
  • Figure 2:
  • Figure 3: Our proposed benchmark include 3D shapes from the ABO collins2022abo, GSO downs2022google, Meta meta_dtc, and Objaverse objaverse datasets. (a) The histogram of different datasets across different shape complexities. (b) The pie chart of the total counts by shape complexities. (c) Sample shapes of different shape complexities.
  • Figure 4: The process of computing sharp normal errors (SNE). We compute MSE loss in the sharp regions of the normal.
  • Figure 5: Qualitative comparison of the VAE reconstruction results. $^\dag$ indicates the fine-tuning model that uses the same training data as ours.
  • ...and 6 more figures