Table of Contents
Fetching ...

GraphicsDreamer: Image to 3D Generation with Physical Consistency

Pei Chen, Fudong Wang, Yixuan Tong, Jingdong Chen, Ming Yang, Minghui Yang

TL;DR

GraphicsDreamer addresses the production-ready 3D asset generation problem from a single image by coupling a six-domain PBR-conditioned diffusion model with a PBR-constrained inverse rendering stage. It jointly models color, geometry, and intrinsic materials (albedo, roughness, metallic) across six views and refines the output with a mixed implicit–explicit surface representation, topology optimization, and UV unwrapping. Quantitative results on the Google Scanned Object dataset show superior novel-view synthesis accuracy and surface reconstruction metrics, and the method supports realistic relighting using environment maps. This approach advances practical, artist-ready 3D content generation suitable for direct use in modern graphics engines. It narrows the gap between automated 3D generation and production pipelines by delivering geometry, textures, and PBR maps in a single framework.

Abstract

Recently, the surge of efficient and automated 3D AI-generated content (AIGC) methods has increasingly illuminated the path of transforming human imagination into complex 3D structures. However, the automated generation of 3D content is still significantly lags in industrial application. This gap exists because 3D modeling demands high-quality assets with sharp geometry, exquisite topology, and physically based rendering (PBR), among other criteria. To narrow the disparity between generated results and artists' expectations, we introduce GraphicsDreamer, a method for creating highly usable 3D meshes from single images. To better capture the geometry and material details, we integrate the PBR lighting equation into our cross-domain diffusion model, concurrently predicting multi-view color, normal, depth images, and PBR materials. In the geometry fusion stage, we continue to enforce the PBR constraints, ensuring that the generated 3D objects possess reliable texture details, supporting realistic relighting. Furthermore, our method incorporates topology optimization and fast UV unwrapping capabilities, allowing the 3D products to be seamlessly imported into graphics engines. Extensive experiments demonstrate that our model can produce high quality 3D assets in a reasonable time cost compared to previous methods.

GraphicsDreamer: Image to 3D Generation with Physical Consistency

TL;DR

GraphicsDreamer addresses the production-ready 3D asset generation problem from a single image by coupling a six-domain PBR-conditioned diffusion model with a PBR-constrained inverse rendering stage. It jointly models color, geometry, and intrinsic materials (albedo, roughness, metallic) across six views and refines the output with a mixed implicit–explicit surface representation, topology optimization, and UV unwrapping. Quantitative results on the Google Scanned Object dataset show superior novel-view synthesis accuracy and surface reconstruction metrics, and the method supports realistic relighting using environment maps. This approach advances practical, artist-ready 3D content generation suitable for direct use in modern graphics engines. It narrows the gap between automated 3D generation and production pipelines by delivering geometry, textures, and PBR maps in a single framework.

Abstract

Recently, the surge of efficient and automated 3D AI-generated content (AIGC) methods has increasingly illuminated the path of transforming human imagination into complex 3D structures. However, the automated generation of 3D content is still significantly lags in industrial application. This gap exists because 3D modeling demands high-quality assets with sharp geometry, exquisite topology, and physically based rendering (PBR), among other criteria. To narrow the disparity between generated results and artists' expectations, we introduce GraphicsDreamer, a method for creating highly usable 3D meshes from single images. To better capture the geometry and material details, we integrate the PBR lighting equation into our cross-domain diffusion model, concurrently predicting multi-view color, normal, depth images, and PBR materials. In the geometry fusion stage, we continue to enforce the PBR constraints, ensuring that the generated 3D objects possess reliable texture details, supporting realistic relighting. Furthermore, our method incorporates topology optimization and fast UV unwrapping capabilities, allowing the 3D products to be seamlessly imported into graphics engines. Extensive experiments demonstrate that our model can produce high quality 3D assets in a reasonable time cost compared to previous methods.

Paper Structure

This paper contains 23 sections, 18 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: GraphicsDreamer utilizes a two-stage generation approach, integrating PBR lighting conditions into both the multi-view synthesis and reconstruction processes. The 3D models produced by GraphicsDreamer possess clear geometry, clean topology, and complete PBR maps, allowing them to be directly imported and manipulated within graphic engines.
  • Figure 2: Our method consists of three phases. Given a single input image, we train a diffusion model (Sec. \ref{['sec:sprseview_gen']}) to generate multi view images (6 views), including RGB color for the overall appearance, normal and depth as geometric information, intrinsic materials for texture details, conditioned by a PBR approch (Sec. \ref{['sec:pbr_render']}). The generated images are then integrated into an inverse rendering reconstruction (Sec. \ref{['sec:explit_iso']}) also in conjunction with the PBR process to guarantee the consistency with the diffusion model. At last, our method will product appealing 3D assets with artistically optimized topology and UV textures (Sec. \ref{['sec:assts_3d']}).
  • Figure 3: The novel-view synthesis results by the GraphicsDreamer on objects with different materials. As can be seen, the model effectively separates the inherent colors of objects under various lighting conditions, as observed in the textures on the train and coins in the albedo column. It also identifies metallic materials, such as the hammer and train wheels in the metallic column. This capability to identify such characteristics is essential for precisely conveying the material of objects and for successful relighting.
  • Figure 4: The qualitative comparisons with baseline methods on single view reconstruction. Due to the line width limitation, we do not show the untextured meshes for SF3D and 3DTopia-XL.
  • Figure 5: Mixed surface representation and Physically-based Rendering implemented in our methd.
  • ...and 7 more figures