Table of Contents
Fetching ...

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

TL;DR

Meta 3D AssetGen delivers fast, relightable 3D assets from text or images by a two-stage pipeline: a 4-view shaded+albedo grid guides a PBR-aware sparse-view reconstructor that outputs an SDF-based surface with per-voxel PBR triplets, followed by a UV-space texture refiner. The approach combines a differentiable VolSDF renderer, a deferred shading loss, and a texture refinement transformer to produce high-quality geometry and materials (rho_0, gamma, alpha) suitable for realistic relighting. It achieves state-of-the-art performance in few-view reconstruction (CD and LPIPS gains) and strong user-preference results for text-to-3D generation, while delivering assets in under 30 seconds. This work has practical impact for game, AR/VR, and design pipelines by enabling fast, controllable, PBR-fledged 3D asset creation from natural prompts.

Abstract

We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

TL;DR

Meta 3D AssetGen delivers fast, relightable 3D assets from text or images by a two-stage pipeline: a 4-view shaded+albedo grid guides a PBR-aware sparse-view reconstructor that outputs an SDF-based surface with per-voxel PBR triplets, followed by a UV-space texture refiner. The approach combines a differentiable VolSDF renderer, a deferred shading loss, and a texture refinement transformer to produce high-quality geometry and materials (rho_0, gamma, alpha) suitable for realistic relighting. It achieves state-of-the-art performance in few-view reconstruction (CD and LPIPS gains) and strong user-preference results for text-to-3D generation, while delivering assets in under 30 seconds. This work has practical impact for game, AR/VR, and design pipelines by enabling fast, controllable, PBR-fledged 3D asset creation from natural prompts.

Abstract

We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io
Paper Structure (41 sections, 33 equations, 13 figures, 4 tables)

This paper contains 41 sections, 33 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: We present Meta 3D AssetGen, a novel text- or image-conditioned generator of 3D meshes with physically-based rendering materials (top). Meta 3D AssetGen produces meshes with detailed geometry and high-fidelity textures, and decomposes materials into albedo, metalness, and roughness (bottom left), which allows to realistically relight objects in new environments (bottom right).
  • Figure 2: Overview. Given a text prompt, AssetGen generates a 3D mesh with PBR materials in two stages. The first text-to-image stage (blue) predicts a 6-channel image depicting 4 views of the object with shaded and albedo colors. The second image-to-3D stage includes two steps. First, a 3D reconstructor (dubbed MetaI LRM) outputs a triplane-supported SDF field converted into a mesh with textured PBR materials (orange). Then, PBR materials are enhanced with our texture refiner which recovers missing details from the input views (green).
  • Figure 3: Qualitative ablation on albedo generation. In text-to-3D, generating 4 views representing albedo colors alongside shaded RGB colors improves material estimation for our 3D reconstructor. With both inputs, the model accurately predicts the armor as metallic and smooth, while the bear's fur is rough.
  • Figure 4: Qualitative comparison for sparse-view reconstruction. AssetGen gives better geometry (shown in orange) and higher fidelity texture (inset) compared to state of the art. SDF representation along with the direct SDF loss gives a better geometry compared to the base LightplaneLRM model which uses occupancy (row 4 and 5). Furthermore, our texture refiner greatly enhances texture fidelity (row 5 and 6).
  • Figure 5: Qualitative comparison for text-to-3D. We compare 3D meshes generated by Meta 3D AssetGen and state-of-the-art baselines. We include material decomposition for methods producing PBR materials (Luma Genie and our Meta 3D AssetGen). Our approach produces higher quality materials with better-defined metalness and roughness, and a more accurate decoupling of lighting effects in the albedo.
  • ...and 8 more figures