Table of Contents
Fetching ...

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

Fan Fei, Jiajun Tang, Fei-Peng Tian, Boxin Shi, Ping Tan

TL;DR

PacTure tackles efficient, high-fidelity PBR texture generation for untextured meshes by combining view packing with a visual autoregressive backbone. It packs six view maps into a single atlas to boost per-view resolution without increasing inference cost, and employs a two-stage, single-view-to-multi-view generation strategy with domain embeddings to achieve multi-view, multi-domain outputs that back-project reliably to UV space. The approach yields state-of-the-art texture quality and faster inference compared to baselines, demonstrating strong practical potential for scalable 3D asset creation. Limitations include occluded regions not captured by the canonical views, which are filled via UV-space extrapolation and may introduce minor inconsistencies. Overall, PacTure offers a significant advance in efficient, controllable PBR texturing with broad applicability in games, movies, and virtual/augmented reality workflows.

Abstract

We present PacTure, a novel framework for generating physically-based rendering (PBR) material textures from an untextured 3D mesh, a text description, and an optional image prompt. Early 2D generation-based texturing approaches generate textures sequentially from different views, resulting in long inference times and globally inconsistent textures. More recent approaches adopt multi-view generation with cross-view attention to enhance global consistency, which, however, limits the resolution for each view. In response to these weaknesses, we first introduce view packing, a novel technique that significantly increases the effective resolution for each view during multi-view generation without imposing additional inference cost, by formulating the arrangement of multi-view maps as a 2D rectangle bin packing problem. In contrast to UV mapping, it preserves the spatial proximity essential for image generation and maintains full compatibility with current 2D generative models. To further reduce the inference cost, we enable fine-grained control and multi-domain generation within the next-scale prediction autoregressive framework to create an efficient multi-view multi-domain generative backbone. Extensive experiments show that PacTure outperforms state-of-the-art methods in both quality of generated PBR textures and efficiency in training and inference.

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

TL;DR

PacTure tackles efficient, high-fidelity PBR texture generation for untextured meshes by combining view packing with a visual autoregressive backbone. It packs six view maps into a single atlas to boost per-view resolution without increasing inference cost, and employs a two-stage, single-view-to-multi-view generation strategy with domain embeddings to achieve multi-view, multi-domain outputs that back-project reliably to UV space. The approach yields state-of-the-art texture quality and faster inference compared to baselines, demonstrating strong practical potential for scalable 3D asset creation. Limitations include occluded regions not captured by the canonical views, which are filled via UV-space extrapolation and may introduce minor inconsistencies. Overall, PacTure offers a significant advance in efficient, controllable PBR texturing with broad applicability in games, movies, and virtual/augmented reality workflows.

Abstract

We present PacTure, a novel framework for generating physically-based rendering (PBR) material textures from an untextured 3D mesh, a text description, and an optional image prompt. Early 2D generation-based texturing approaches generate textures sequentially from different views, resulting in long inference times and globally inconsistent textures. More recent approaches adopt multi-view generation with cross-view attention to enhance global consistency, which, however, limits the resolution for each view. In response to these weaknesses, we first introduce view packing, a novel technique that significantly increases the effective resolution for each view during multi-view generation without imposing additional inference cost, by formulating the arrangement of multi-view maps as a 2D rectangle bin packing problem. In contrast to UV mapping, it preserves the spatial proximity essential for image generation and maintains full compatibility with current 2D generative models. To further reduce the inference cost, we enable fine-grained control and multi-domain generation within the next-scale prediction autoregressive framework to create an efficient multi-view multi-domain generative backbone. Extensive experiments show that PacTure outperforms state-of-the-art methods in both quality of generated PBR textures and efficiency in training and inference.

Paper Structure

This paper contains 23 sections, 9 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We propose view packing to compactly pack multi-view maps onto the atlas as the condition and target maps for image generative models used in texturing. This technique significantly increases the effective resolution for each view without increasing generation cost by reducing background pixels that do not contribute to the final texture. Unlike the compact but hard-to-read UV maps, the packed views retain the spatial proximity and thus are well-suited for 2D generative models.
  • Figure 2: An overview of our pipeline comprising the following steps. 1) Given the input untextured mesh, text description, and optional image prompt, we first render geometry condition maps, such as position and surface normal, from six fixed viewpoints. 2) We then use off-the-shelf controllable generative model SD22ControlNet23 with intrinsic decomposition IA24 to generate a single-view albedo map. 3) The geometry conditions and the single-view albedo map are packed compactly onto the atlas as the control images for the subsequent multi-view generation. 4) The next-scale prediction-based autoregressive model, Infinity Infinity25, is adopted for multi-view generation. The control images are encoded by its VAE encoder to multi-scale control token maps, which is then added to the image token maps after a projection layer, together with the domain embedding to generate multi-view multi-domain images. 5) The generated multi-view PBR material maps are back-projected to UV space and post-processed, resulting in the textured mesh with UV space PBR material maps.
  • Figure 3: Comparison of our proposed view packing and the traditional regular view tiling.
  • Figure 4: Qualitative comparison between PacTure and baseline texturing methods. We show the shaded image for all methods. For methods that generate PBR textures, we also show albedo to the right of the shaded image and roughness/metallic to the bottom of the albedo.
  • Figure E: The overview of our back-projection process.
  • ...and 2 more figures