Table of Contents
Fetching ...

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Shangzan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

TL;DR

MaPa addresses text-driven material painting for 3D shapes by using a segment-wise procedural material graph representation and a segment-controlled diffusion bridge to connect text prompts with high-resolution, relightable materials. It segments the mesh, generates segment-aligned 2D images via a segment-conditioned diffusion model, then initializes and optimizes segment-level material graphs through a differentiable rendering pipeline, with iterative recovery and downstream editing. The approach achieves photorealistic, tileable materials with strong editability and outperforms strong baselines in both quantitative metrics (FID/KID) and user studies, while enabling diverse results and image-prompt appearance transfer. This work advances practical 3D asset creation by combining diffusion-based image synthesis, CLIP-based material retrieval, and differentiable graphics to produce controllable, high-quality materials for complex geometries.

Abstract

This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and corresponding text descriptions, to train a material graph generative model, we propose to leverage the pre-trained 2D diffusion model as a bridge to connect the text and material graphs. Specifically, our approach decomposes a shape into a set of segments and designs a segment-controlled diffusion model to synthesize 2D images that are aligned with mesh parts. Based on generated images, we initialize parameters of material graphs and fine-tune them through the differentiable rendering module to produce materials in accordance with the textual description. Extensive experiments demonstrate the superior performance of our framework in photorealism, resolution, and editability over existing methods. Project page: https://zju3dv.github.io/MaPa

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

TL;DR

MaPa addresses text-driven material painting for 3D shapes by using a segment-wise procedural material graph representation and a segment-controlled diffusion bridge to connect text prompts with high-resolution, relightable materials. It segments the mesh, generates segment-aligned 2D images via a segment-conditioned diffusion model, then initializes and optimizes segment-level material graphs through a differentiable rendering pipeline, with iterative recovery and downstream editing. The approach achieves photorealistic, tileable materials with strong editability and outperforms strong baselines in both quantitative metrics (FID/KID) and user studies, while enabling diverse results and image-prompt appearance transfer. This work advances practical 3D asset creation by combining diffusion-based image synthesis, CLIP-based material retrieval, and differentiable graphics to produce controllable, high-quality materials for complex geometries.

Abstract

This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and corresponding text descriptions, to train a material graph generative model, we propose to leverage the pre-trained 2D diffusion model as a bridge to connect the text and material graphs. Specifically, our approach decomposes a shape into a set of segments and designs a segment-controlled diffusion model to synthesize 2D images that are aligned with mesh parts. Based on generated images, we initialize parameters of material graphs and fine-tune them through the differentiable rendering module to produce materials in accordance with the textual description. Extensive experiments demonstrate the superior performance of our framework in photorealism, resolution, and editability over existing methods. Project page: https://zju3dv.github.io/MaPa
Paper Structure (32 sections, 2 equations, 10 figures, 1 table)

This paper contains 32 sections, 2 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Illustration of our pipeline. Our pipeline primarily consists of four steps: a) Segment-controlled image generation. First, we decompose the input mesh into various segments, project these segments onto 2D images, and then generate the corresponding images using the segment-controlled ControlNet. b) Material grouping. We group segments that share the same material and have similar appearance into a material group. c) Material graph selection and optimization. For each material group, we select an appropriate material graph based on generated images and then optimize this material graph. d) Iterative material recovery. We render additional views of the input mesh with the optimized material graphs, inpaint the missing regions in these rendered images, and repeat steps b) and c) until all segments are assigned with material graphs.
  • Figure 2: Downstream editing. We perform material editing on generated material. The user can edit the material using textual prompts through the GPT-4 and a set of predifined APIs.
  • Figure 3: Qualitative comparisons. The results generated by our method and all the baselines are rendered in the same CG environment for comparison. The prompts for the three objects are: "a photo of a wooden bedside table," "a photo of a toy rocket," and "a photo of a brand-new sword."
  • Figure 4: Diversity of our generated material. We show the diversity of results synthesized by our framework with the same prompt: "A photo of a toy airplane". Images in the first row are generated by diffusion models, and models in the second row are our painted meshes.
  • Figure 5: Appearance transfer. Our method can also take image prompt as input, transferring the appearance of reference images to input objects.
  • ...and 5 more figures