Table of Contents
Fetching ...

Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control

Shimon Vainer, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Slava Elizarov, Simon Donné

TL;DR

The design decisions involved in making this model multi-view consistent are discussed, and the effectiveness of the approach in ablation studies, as well as practical applications are demonstrated.

Abstract

Multi-view consistency remains a challenge for image diffusion models. Even within the Text-to-Texture problem, where perfect geometric correspondences are known a priori, many methods fail to yield aligned predictions across views, necessitating non-trivial fusion methods to incorporate the results onto the original mesh. We explore this issue for a Collaborative Control workflow specifically in PBR Text-to-Texture. Collaborative Control directly models PBR image probability distributions, including normal bump maps; to our knowledge, the only diffusion model to directly output full PBR stacks. We discuss the design decisions involved in making this model multi-view consistent, and demonstrate the effectiveness of our approach in ablation studies, as well as practical applications.

Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control

TL;DR

The design decisions involved in making this model multi-view consistent are discussed, and the effectiveness of the approach in ablation studies, as well as practical applications are demonstrated.

Abstract

Multi-view consistency remains a challenge for image diffusion models. Even within the Text-to-Texture problem, where perfect geometric correspondences are known a priori, many methods fail to yield aligned predictions across views, necessitating non-trivial fusion methods to incorporate the results onto the original mesh. We explore this issue for a Collaborative Control workflow specifically in PBR Text-to-Texture. Collaborative Control directly models PBR image probability distributions, including normal bump maps; to our knowledge, the only diffusion model to directly output full PBR stacks. We discuss the design decisions involved in making this model multi-view consistent, and demonstrate the effectiveness of our approach in ablation studies, as well as practical applications.

Paper Structure

This paper contains 24 sections, 2 equations, 14 figures, 1 algorithm.

Figures (14)

  • Figure 1: We propose an end-to-end pipeline for generating graphics-ready PBR textures based only on a mesh and a text prompt. Conditioned on a single-view PBR stack, our proposed approach directly and jointly diffuses multi-view PBR images in view space. These are multi-view consistent enough that we can naively fuse them into the mesh texture. This includes linear albedo, roughness and metallic maps, as well as normal bump maps.
  • Figure 2: Existing RGB-based pipelines (left) bake in lighting artifacts into the output albedo maps (taken from Fantasia3D's Figure 4 chen2023fantasia3d). Collaborative Control (middle, from their appendix), as artist-created albedo maps (right sphericalshipalbedo), do not exhibit this.
  • Figure 3: Example albedo, roughness, metallic, bump map, and rendered images from an expert artist PBRexamplesketchfab. Note how PBR channels are view-independent, but renders are not.
  • Figure 4: (a) Structure of the existing attention blocks in vanilla Stable Diffusion 2.1 and Collaborative Control, as well as the multi-view communication we introduce. (b) The components of the multi-view communication block.
  • Figure 5: The point-wise attention mechanism.
  • ...and 9 more figures