Table of Contents
Fetching ...

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan

TL;DR

This work proposes a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency and surpasses existing state-of-the-art methods in texturing.

Abstract

Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

TL;DR

This work proposes a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency and surpasses existing state-of-the-art methods in texturing.

Abstract

Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.

Paper Structure

This paper contains 27 sections, 7 equations, 13 figures, 7 tables, 2 algorithms.

Figures (13)

  • Figure 1: MVPaint generates multi-view consistent textures with arbitrary UV unwrapping and high generation versatility.
  • Figure 2: The Framework Overview of MVPaint. Given an input mesh, Stage 1 of MVPaint utilizes a synchronized multi-view generation (SMG) model, consisting of a control-based T2MV model and an I2I model, for 3D texture initialization. In Stage 2, the synchronized views are reprojected back to UV space, where inpainting is performed on the 3D point cloud to fill the holes (shown in red dots), hence completing the UV map. In Stage 3, the completed UV map undergoes super-resolution, adding finer details, followed by seam detection and 3D-aware smoothing to achieve a complete, seamless, and multi-view consistent 3D texture.
  • Figure 3: The Effectiveness of Synchronization on Multi-view Image Generation. Although T2MV models generate Janus-problem-free results, they still suffer from texture misalignment from different views. In contrast, the proposed SMG model can effectively enforce multi-view consistency for T2MV generation.
  • Figure 4: Spatial-aware 3D Inpainting could effectively accomplish texture completion for 3D structures with complex geometries and large unobserved areas.
  • Figure 5: Spatial-aware Seam-smoothing Algorithm could revise texture seams from 2D UV unwrapping by smoothing color vectors using their 3D neighbors.
  • ...and 8 more figures