Table of Contents
Fetching ...

A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation

Jiajie Fan, Amal Trigui, Andrea Bonfanti, Felix Dietrich, Thomas Bäck, Hao Wang

TL;DR

SpoDify introduces a learning-free spectral-domain diffusion framework that encodes meshes via SVD into compact spectral features and a DWT basis, enabling diffusion in a low-dimensional latent space. The method combines clustering-based representation, SDF, and wavelet transforms to produce a compact $d$-dimensional latent, here $d=512$, and reconstructs meshes via inverse transforms and marching cubes. Empirical results on ShapeNet chairs and airplanes show competitive quality with significantly reduced training time and memory requirements, highlighting strong efficiency and scalability for high-dimensional mesh generation. The approach offers a practical path toward diffusion-based 3D generation with limited data and resources, and points to future work incorporating full wavelet details and multi-category extensions.

Abstract

Recent advancements in learning latent codes derived from high-dimensional shapes have demonstrated impressive outcomes in 3D generative modeling. Traditionally, these approaches employ a trained autoencoder to acquire a continuous implicit representation of source shapes, which can be computationally expensive. This paper introduces a novel framework, spectral-domain diffusion for high-quality shape generation SpoDify, that utilizes singular value decomposition (SVD) for shape encoding. The resulting eigenvectors can be stored for subsequent decoding, while generative modeling is performed on the eigenfeatures. This approach efficiently encodes complex meshes into continuous implicit representations, such as encoding a 15k-vertex mesh to a 512-dimensional latent code without learning. Our method exhibits significant advantages in scenarios with limited samples or GPU resources. In mesh generation tasks, our approach produces high-quality shapes that are comparable to state-of-the-art methods.

A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation

TL;DR

SpoDify introduces a learning-free spectral-domain diffusion framework that encodes meshes via SVD into compact spectral features and a DWT basis, enabling diffusion in a low-dimensional latent space. The method combines clustering-based representation, SDF, and wavelet transforms to produce a compact -dimensional latent, here , and reconstructs meshes via inverse transforms and marching cubes. Empirical results on ShapeNet chairs and airplanes show competitive quality with significantly reduced training time and memory requirements, highlighting strong efficiency and scalability for high-dimensional mesh generation. The approach offers a practical path toward diffusion-based 3D generation with limited data and resources, and points to future work incorporating full wavelet details and multi-category extensions.

Abstract

Recent advancements in learning latent codes derived from high-dimensional shapes have demonstrated impressive outcomes in 3D generative modeling. Traditionally, these approaches employ a trained autoencoder to acquire a continuous implicit representation of source shapes, which can be computationally expensive. This paper introduces a novel framework, spectral-domain diffusion for high-quality shape generation SpoDify, that utilizes singular value decomposition (SVD) for shape encoding. The resulting eigenvectors can be stored for subsequent decoding, while generative modeling is performed on the eigenfeatures. This approach efficiently encodes complex meshes into continuous implicit representations, such as encoding a 15k-vertex mesh to a 512-dimensional latent code without learning. Our method exhibits significant advantages in scenarios with limited samples or GPU resources. In mesh generation tasks, our approach produces high-quality shapes that are comparable to state-of-the-art methods.

Paper Structure

This paper contains 29 sections, 6 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: A gallery of 3D meshes generated by SpoDify.
  • Figure 2: Diagram of SpoDify. We apply singular value decomposition on a set of the coefficients that are derived by applying a signed distance field and discrete wavelet transformation on source meshes, resulting in the dataset of spectral features. Here, the basis $V_d^\top$ will be stored for later generation; spectral features $\alpha$ will serve as one sample and will be used for training the diffusion model. To generate a new mesh, the trained diffusion model generates new $\alpha$ for a given random noise. The generated $\alpha$ will be denormalized and then multiplied with pre-computed and stored $V^\top$ to obtain new low-frequency coefficients $C_i$, which can be converted to new mesh $M_i$.
  • Figure 3: Effect of Wavelet Decomposition levels and the Dropping of High-Frequency Coefficients on Plane Mesh Reconstruction. (a) Original plane mesh; (b) Reconstructed plane after applying wavelet decomposition and reconstruction using all coefficients (both coarse coefficients and fine coefficients); (c) Reconstruction after one single-level wavelet decomposition level, keeping only low-frequency coefficients (coarse coefficients) and setting others to zero; (d) Reconstruction after three levels of wavelet decomposition, keeping only low-frequency coefficients (coarse coefficients) and setting others to zero.
  • Figure 4: Qualitative comparison of chairs generated by different methods: (a) 3DShape2VecSet zhang20233dshape2vecset, (b) NWD hui2022neural, and (c) our SpoDify.
  • Figure 5: Truncation level. Changing the reduced length $d$ of rows in $\alpha$ can impact the visual quality of final results and the computational power required to train the generative model. We notice that by truncating the row length until $d=512$, no significant visual artifacts are brought to the reconstructed meshes, whereas with $d=256$, reconstructed meshes show structural errors.
  • ...and 6 more figures