Table of Contents
Fetching ...

Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation

Jiayu Yang, Taizhang Shang, Weixuan Sun, Xibin Song, Ziang Cheng, Senbo Wang, Shenzhou Chen, Weizhe Liu, Hongdong Li, Pan Ji

TL;DR

Pandora3D addresses the challenge of generating high-quality 3D shapes and textures from diverse prompts, including images and text. It combines a VAE-based implicit-geometry encoder with a diffusion model conditioned on global/local visual features, and introduces an extended VAE with centroid-based linear-attention Q-former to scale to large point clouds. An alternative Artist-Created Mesh pathway provides a token-based autoregressive route with mesh compression to improve topology control and efficiency. The texture pipeline progresses from frontal and multi-view RGB generation to PBR mapping, high-resolution refinement, and a pixel-wise consistency scheduler to ensure coherent textures across views. Together with a robust data-processing and rendering framework and public release of code and weights, Pandora3D offers a practical, end-to-end solution for production-ready 3D content generation.

Abstract

This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts, including single images, multi-view images, and text descriptions. The framework consists of 3D shape generation and texture generation. (1). The 3D shape generation pipeline employs a Variational Autoencoder (VAE) to encode implicit 3D geometries into a latent space and a diffusion network to generate latents conditioned on input prompts, with modifications to enhance model capacity. An alternative Artist-Created Mesh (AM) generation approach is also explored, yielding promising results for simpler geometries. (2). Texture generation involves a multi-stage process starting with frontal images generation followed by multi-view images generation, RGB-to-PBR texture conversion, and high-resolution multi-view texture refinement. A consistency scheduler is plugged into every stage, to enforce pixel-wise consistency among multi-view textures during inference, ensuring seamless integration. The pipeline demonstrates effective handling of diverse input formats, leveraging advanced neural architectures and novel methodologies to produce high-quality 3D content. This report details the system architecture, experimental results, and potential future directions to improve and expand the framework. The source code and pretrained weights are released at: https://github.com/Tencent/Tencent-XR-3DGen.

Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation

TL;DR

Pandora3D addresses the challenge of generating high-quality 3D shapes and textures from diverse prompts, including images and text. It combines a VAE-based implicit-geometry encoder with a diffusion model conditioned on global/local visual features, and introduces an extended VAE with centroid-based linear-attention Q-former to scale to large point clouds. An alternative Artist-Created Mesh pathway provides a token-based autoregressive route with mesh compression to improve topology control and efficiency. The texture pipeline progresses from frontal and multi-view RGB generation to PBR mapping, high-resolution refinement, and a pixel-wise consistency scheduler to ensure coherent textures across views. Together with a robust data-processing and rendering framework and public release of code and weights, Pandora3D offers a practical, end-to-end solution for production-ready 3D content generation.

Abstract

This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts, including single images, multi-view images, and text descriptions. The framework consists of 3D shape generation and texture generation. (1). The 3D shape generation pipeline employs a Variational Autoencoder (VAE) to encode implicit 3D geometries into a latent space and a diffusion network to generate latents conditioned on input prompts, with modifications to enhance model capacity. An alternative Artist-Created Mesh (AM) generation approach is also explored, yielding promising results for simpler geometries. (2). Texture generation involves a multi-stage process starting with frontal images generation followed by multi-view images generation, RGB-to-PBR texture conversion, and high-resolution multi-view texture refinement. A consistency scheduler is plugged into every stage, to enforce pixel-wise consistency among multi-view textures during inference, ensuring seamless integration. The pipeline demonstrates effective handling of diverse input formats, leveraging advanced neural architectures and novel methodologies to produce high-quality 3D content. This report details the system architecture, experimental results, and potential future directions to improve and expand the framework. The source code and pretrained weights are released at: https://github.com/Tencent/Tencent-XR-3DGen.

Paper Structure

This paper contains 26 sections, 4 equations, 13 figures.

Figures (13)

  • Figure 1: 3D Geometry variational autoencoder. (A): Our base VAE for 3D geometry compression. (B) Extended VAE for efficient 3D geometry compression.
  • Figure 2: Diffusion pipline. In the process of training a diffusion model, the DinoV2, CLIP, and VAE Decoder components are kept frozen
  • Figure 3: Pipeline for Artist-Created Mesh Generation. Initially, meshes are encoded into discrete token sequences. These sequences are then processed through a decoder-only autoregressive model that utilizes a Transformer network architecture. To enforce multi-modality condition control, a pretrained condition encoder network is employed. This network effectively integrates diverse modalities, ensuring that the generated meshes adhere to specified conditions.
  • Figure 4: Example Meshes Generated by Our Artist-Created Mesh Generation Model. The meshes produced by our model demonstrate superior performance in maintaining topological consistency, showcasing the effectiveness of our approach in generating high-quality artistic meshes.
  • Figure 5: Texture Generation Pipeline (input image and mesh from Trellis3D).
  • ...and 8 more figures