Table of Contents
Fetching ...

MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing

Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu

TL;DR

MuMA tackles 3D PBR texturing by splitting the task into multi-view generation of shaded and albedo channels and a second-stage intrinsic decomposition for remaining materials, enabling high-fidelity, view-consistent textures under lighting variations. The approach leverages SDXL with MV-Adapter for multi-view diffusion, connects shaded outputs to an intrinsic decomposition model (IDArb) to obtain metallic and roughness channels, and employs an agentic post-processing loop with an MLLM (GPT-4o) to score and select the best albedo candidates, including Best-of-N options. Extensive experiments on a large Objaverse-derived dataset show MuMA outperforms baselines in appearance and material fidelity for text-conditioned textured meshes and achieves competitive results for image-conditioned scenarios, while dramatically reducing texture-generation time. The work demonstrates a practical, scalable pipeline for high-quality 3D textures, with implications for faster, more reliable 3D content creation and relighting across diverse lighting conditions.

Abstract

Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials. In this work, we propose MuMA, a method for 3D PBR texturing through Multi-channel Multi-view generation and Agentic post-processing. Our approach features two key innovations: 1) We opt to model shaded and albedo appearance channels, where the shaded channels enables the integration intrinsic decomposition modules for material properties. 2) Leveraging multimodal large language models, we emulate artists' techniques for material assessment and selection. Experiments demonstrate that MuMA achieves superior results in visual quality and material fidelity compared to existing methods.

MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing

TL;DR

MuMA tackles 3D PBR texturing by splitting the task into multi-view generation of shaded and albedo channels and a second-stage intrinsic decomposition for remaining materials, enabling high-fidelity, view-consistent textures under lighting variations. The approach leverages SDXL with MV-Adapter for multi-view diffusion, connects shaded outputs to an intrinsic decomposition model (IDArb) to obtain metallic and roughness channels, and employs an agentic post-processing loop with an MLLM (GPT-4o) to score and select the best albedo candidates, including Best-of-N options. Extensive experiments on a large Objaverse-derived dataset show MuMA outperforms baselines in appearance and material fidelity for text-conditioned textured meshes and achieves competitive results for image-conditioned scenarios, while dramatically reducing texture-generation time. The work demonstrates a practical, scalable pipeline for high-quality 3D textures, with implications for faster, more reliable 3D content creation and relighting across diverse lighting conditions.

Abstract

Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials. In this work, we propose MuMA, a method for 3D PBR texturing through Multi-channel Multi-view generation and Agentic post-processing. Our approach features two key innovations: 1) We opt to model shaded and albedo appearance channels, where the shaded channels enables the integration intrinsic decomposition modules for material properties. 2) Leveraging multimodal large language models, we emulate artists' techniques for material assessment and selection. Experiments demonstrate that MuMA achieves superior results in visual quality and material fidelity compared to existing methods.

Paper Structure

This paper contains 25 sections, 4 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Illustration of our method. Given an untextured mesh, our method produces PBR texuring from user inputs with multi-channel multi-view generation and agentic post-processing.
  • Figure 2: Illustration of different PBR modeling. Unlike (a) Joint Modeling that faces training challenges and data imbalance, and (b) Separate Modeling encounters inconsistency between models, (c) Ours proposes to modeling albedo and shaded images to connect an intrinsic decomposition model.
  • Figure 3: Overview of MuMA.(a) Muti-Channel Multi-View Generation: Given an untextured mesh and its description, we opt to model shaded and albedo images with Multi-View Diffusion. (b) Agentic Post-Processing: After producing the candidate material channels, we integrate multimodal languague models for scoring and selection, finally outputing the multi-view materials for texturing.
  • Figure 4: Qualitative comparison on text-conditioned generation. For methods that produce materials, we present the albedo, metallic, and roughness maps on the right of each case. The last two rows showcase the comparison results on generated meshes.
  • Figure 5: Qualitative comparison on image-conditioned generation. Given the input mesh, along with the control text and image prompts, we sythesize PBR materials with high-fidelity.
  • ...and 9 more figures