Table of Contents
Fetching ...

A 3D Generation Framework from Cross Modality to Parameterized Primitive

Yiming Liang, Huan Yu, Zili Wang, Shuyou Zhang, Guodong Yi, Jin Wang, Jianrong Tan

TL;DR

This work tackles the challenge of generating high-surface-quality 3D models under storage constraints by introducing a zero-shot, cross-modal framework that assembles models from parameterized primitives. It combines a three-stage pipeline: (i) implicit-diffusion–guided multi-view depth synthesis and superquadric fitting in a TSDF volume; (ii) searching for similar parameterized primitives to replace superquadrics; and (iii) storing only primitive parameters to reconstruct the target shapes. The key contributions are a primitive-fitting and matching algorithm that enhances surface quality, a storage method that dramatically reduces model size, and a comprehensive three-stage, text-and-image–driven generation approach that generalizes to virtual and real scenes. The results show improved geometric fidelity (CD, VIoU, F1-Score, NC) and substantial storage savings, enabling rapid prototyping of simple 3D models in practical pipelines.

Abstract

Recent advancements in AI-driven 3D model generation have leveraged cross modality, yet generating models with smooth surfaces and minimizing storage overhead remain challenges. This paper introduces a novel multi-stage framework for generating 3D models composed of parameterized primitives, guided by textual and image inputs. In the framework, A model generation algorithm based on parameterized primitives, is proposed, which can identifies the shape features of the model constituent elements, and replace the elements with parameterized primitives with high quality surface. In addition, a corresponding model storage method is proposed, it can ensure the original surface quality of the model, while retaining only the parameters of parameterized primitives. Experiments on virtual scene dataset and real scene dataset demonstrate the effectiveness of our method, achieving a Chamfer Distance of 0.003092, a VIoU of 0.545, a F1-Score of 0.9139 and a NC of 0.8369, with primitive parameter files approximately 6KB in size. Our approach is particularly suitable for rapid prototyping of simple models.

A 3D Generation Framework from Cross Modality to Parameterized Primitive

TL;DR

This work tackles the challenge of generating high-surface-quality 3D models under storage constraints by introducing a zero-shot, cross-modal framework that assembles models from parameterized primitives. It combines a three-stage pipeline: (i) implicit-diffusion–guided multi-view depth synthesis and superquadric fitting in a TSDF volume; (ii) searching for similar parameterized primitives to replace superquadrics; and (iii) storing only primitive parameters to reconstruct the target shapes. The key contributions are a primitive-fitting and matching algorithm that enhances surface quality, a storage method that dramatically reduces model size, and a comprehensive three-stage, text-and-image–driven generation approach that generalizes to virtual and real scenes. The results show improved geometric fidelity (CD, VIoU, F1-Score, NC) and substantial storage savings, enabling rapid prototyping of simple 3D models in practical pipelines.

Abstract

Recent advancements in AI-driven 3D model generation have leveraged cross modality, yet generating models with smooth surfaces and minimizing storage overhead remain challenges. This paper introduces a novel multi-stage framework for generating 3D models composed of parameterized primitives, guided by textual and image inputs. In the framework, A model generation algorithm based on parameterized primitives, is proposed, which can identifies the shape features of the model constituent elements, and replace the elements with parameterized primitives with high quality surface. In addition, a corresponding model storage method is proposed, it can ensure the original surface quality of the model, while retaining only the parameters of parameterized primitives. Experiments on virtual scene dataset and real scene dataset demonstrate the effectiveness of our method, achieving a Chamfer Distance of 0.003092, a VIoU of 0.545, a F1-Score of 0.9139 and a NC of 0.8369, with primitive parameter files approximately 6KB in size. Our approach is particularly suitable for rapid prototyping of simple models.

Paper Structure

This paper contains 21 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The framework of our method. In the first stage, an implicit diffusion model is introduced to synthesize multi-view depth images, and the target model is iteratively fitted with superquadrics. In the second stage, the parameterized primitive searching algorithm is executed to match the corresponding parameterized primitives for superquadric elements in the target model. In the third stage, use parameterized primitives to synthesize the target model and save the parameters of the model elements.
  • Figure 2: Shape change law of superquadrics. As $\varepsilon_1$ and $\varepsilon_2$ change, the shape of the superquadric also changes in the $z$-direction and $xy$ plane. The bottom and side edges defined in the text are shown in the figure
  • Figure 3: Nine types of superquadrics and their similar parameterized primitive expressions. The centers of the superquadrics in the figure are all at the origin, and the maximum values in the x-axis, y-axis, and z-axis directions are a, b and c, respectively
  • Figure 4: Qualitative experimental results based on image and text inputs. The first six columns are based on image input, while the remaining columns are based on text input
  • Figure 5: Qualitative experimental results on the ShapeNet dataset. We presented the test sample results for the categories of bench, table, plane, cabinet, and bottle in ShapeNet
  • ...and 1 more figures