Table of Contents
Fetching ...

Meta 3D Gen

Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

TL;DR

Meta 3D Gen (3DGen) presents a fast, two-stage pipeline for text-to-3D asset generation that delivers production-quality 3D shapes and textures with PBR in under a minute. By uniting Meta 3D AssetGen (text-to-3D) and Meta 3D TextureGen (text-to-texture) within a unified framework, 3DGen represents objects in view, volumetric, and UV spaces and enables efficient retexturing. Stage I generates the 3D geometry and initial texture, while Stage II refines textures via diffusion-based texture generation and optional super-resolution, achieving a 68% win-rate over single-stage baselines and outperforming industry solutions in prompt fidelity and visual quality for complex prompts. The method enables rapid production-ready assets and coherent retexturing for generated or artist-created meshes, with broad implications for games, AR/VR, and Metaverse content creation.

Abstract

We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster.

Meta 3D Gen

TL;DR

Meta 3D Gen (3DGen) presents a fast, two-stage pipeline for text-to-3D asset generation that delivers production-quality 3D shapes and textures with PBR in under a minute. By uniting Meta 3D AssetGen (text-to-3D) and Meta 3D TextureGen (text-to-texture) within a unified framework, 3DGen represents objects in view, volumetric, and UV spaces and enables efficient retexturing. Stage I generates the 3D geometry and initial texture, while Stage II refines textures via diffusion-based texture generation and optional super-resolution, achieving a 68% win-rate over single-stage baselines and outperforming industry solutions in prompt fidelity and visual quality for complex prompts. The method enables rapid production-ready assets and coherent retexturing for generated or artist-created meshes, with broad implications for games, AR/VR, and Metaverse content creation.

Abstract

We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster.
Paper Structure (24 sections, 11 figures, 3 tables)

This paper contains 24 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Meta 3D Gen integrates Meta's foundation models for text-to-3D (Meta 3D AssetGen siddiqui24meta) and text-to-texture (Meta 3D TextureGen bensadoun24meta3dgen) generation in a unified pipeline, enabling efficient, state-of-the-art creation and editing of diverse, high-quality textured 3D assets with PBR material maps.
  • Figure 2: Overview of Meta 3D Gen. The pipeline takes a text prompt as an input and performs text-to-3D generation (Stage I, siddiqui24meta), followed by texture refinement (Stage II, bensadoun24meta3dgen). Stage II can also be used for retexturing of generated or artist-created meshes using new textual prompts provided by the user.
  • Figure 3: User studies: analysis of prompt fidelity, visual quality, geometry and texture parameters as functions of the scene complexity, as described by the text prompt (aggregated across all annotators). We report win rate for 3DGen against baselines and highlight the 50% threshold (dashed line) where our method is found to be as good as the baselines.
  • Figure 4: Visual comparison of text-to-3D generations obtained after Meta 3D Gen's Stage I (left) and Stage II (right). In our A/B user studies, the Stage II generations had a win rate of 68 % in texture quality over the first-stage generations.
  • Figure 5: Qualitative results for text-to-3D generation. We show quality and diversity of text-to-3D generations produced by 3DGen, across different scene categories (single objects and compositions).
  • ...and 6 more figures