A 3D Generation Framework from Cross Modality to Parameterized Primitive
Yiming Liang, Huan Yu, Zili Wang, Shuyou Zhang, Guodong Yi, Jin Wang, Jianrong Tan
TL;DR
This work tackles the challenge of generating high-surface-quality 3D models under storage constraints by introducing a zero-shot, cross-modal framework that assembles models from parameterized primitives. It combines a three-stage pipeline: (i) implicit-diffusion–guided multi-view depth synthesis and superquadric fitting in a TSDF volume; (ii) searching for similar parameterized primitives to replace superquadrics; and (iii) storing only primitive parameters to reconstruct the target shapes. The key contributions are a primitive-fitting and matching algorithm that enhances surface quality, a storage method that dramatically reduces model size, and a comprehensive three-stage, text-and-image–driven generation approach that generalizes to virtual and real scenes. The results show improved geometric fidelity (CD, VIoU, F1-Score, NC) and substantial storage savings, enabling rapid prototyping of simple 3D models in practical pipelines.
Abstract
Recent advancements in AI-driven 3D model generation have leveraged cross modality, yet generating models with smooth surfaces and minimizing storage overhead remain challenges. This paper introduces a novel multi-stage framework for generating 3D models composed of parameterized primitives, guided by textual and image inputs. In the framework, A model generation algorithm based on parameterized primitives, is proposed, which can identifies the shape features of the model constituent elements, and replace the elements with parameterized primitives with high quality surface. In addition, a corresponding model storage method is proposed, it can ensure the original surface quality of the model, while retaining only the parameters of parameterized primitives. Experiments on virtual scene dataset and real scene dataset demonstrate the effectiveness of our method, achieving a Chamfer Distance of 0.003092, a VIoU of 0.545, a F1-Score of 0.9139 and a NC of 0.8369, with primitive parameter files approximately 6KB in size. Our approach is particularly suitable for rapid prototyping of simple models.
