An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes
Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Liefeng Bo, Zilong Dong, Qixing Huang
TL;DR
The paper tackles the challenge of enforcing multi-view texture consistency for 3D meshes generated from text prompts. It introduces a four-stage optimization framework: Stage I generates an over-complete set of RGB-D views with MV-consistent diffusion, Stage II selects a mutually consistent subset via a sequential SDP relaxation while ensuring full mesh coverage, Stage III applies non-rigid, joint alignment (color adjustment on a sparse FFD lattice followed by dense warping using SIFTFlow), and Stage IV stitches textures by solving a second-order MRF to assign mesh faces to views with iterative refinement around stitching cuts. The approach yields significant qualitative and quantitative gains over state-of-the-art methods (e.g., improved photorealism and lower FID), validated on Objaverse models and supported by user studies and ablations. Limitations include incomplete modeling of illumination factors, partial decoupling between pairwise and joint alignments, and higher computational cost, suggesting avenues for end-to-end integration and latent-parameter conditioning in future work.
Abstract
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively. Project page: https://aigc3d.github.io/ConsistenTex.
