Table of Contents
Fetching ...

Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

Tianfu Wang, Anton Obukhov, Konrad Schindler

TL;DR

Consistency^2 presents a fast 3D mesh painting pipeline that leverages Latent Consistency Models (LCMs) to perform few-step, view-consistent denoising across multiple camera views. By separating noise and color textures and employing a variance-preserving interpolation plus a multi-view fusion sampler, the method achieves high-resolution textures with only four denoising steps per view, under two minutes per mesh on consumer GPUs. Quantitatively, it outperforms the prior Text2Tex approach on Objaverse data in FID and KID metrics and delivers about a 7.5x speedup, while qualitatively avoiding common artifacts such as seams and the Janus problem. The work demonstrates a practical, scalable path to interactive 3D texture painting, enabling rapid design iteration and asset recycling with strong multi-view consistency.

Abstract

Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling. Ever since text-to-image models became accessible for inference on consumer hardware, the performance of 3D Painting methods has consistently improved and is currently close to plateauing. At the core of most such models lies denoising diffusion in the latent space, an inherently time-consuming iterative process. Multiple techniques have been developed recently to accelerate generation and reduce sampling iterations by orders of magnitude. Designed for 2D generative imaging, these techniques do not come with recipes for lifting them into 3D. In this paper, we address this shortcoming by proposing a Latent Consistency Model (LCM) adaptation for the task at hand. We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively. Based on the Objaverse dataset samples study, our 3D painting method attains strong preference in all evaluations. Source code is available at https://github.com/kongdai123/consistency2.

Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

TL;DR

Consistency^2 presents a fast 3D mesh painting pipeline that leverages Latent Consistency Models (LCMs) to perform few-step, view-consistent denoising across multiple camera views. By separating noise and color textures and employing a variance-preserving interpolation plus a multi-view fusion sampler, the method achieves high-resolution textures with only four denoising steps per view, under two minutes per mesh on consumer GPUs. Quantitatively, it outperforms the prior Text2Tex approach on Objaverse data in FID and KID metrics and delivers about a 7.5x speedup, while qualitatively avoiding common artifacts such as seams and the Janus problem. The work demonstrates a practical, scalable path to interactive 3D texture painting, enabling rapid design iteration and asset recycling with strong multi-view consistency.

Abstract

Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling. Ever since text-to-image models became accessible for inference on consumer hardware, the performance of 3D Painting methods has consistently improved and is currently close to plateauing. At the core of most such models lies denoising diffusion in the latent space, an inherently time-consuming iterative process. Multiple techniques have been developed recently to accelerate generation and reduce sampling iterations by orders of magnitude. Designed for 2D generative imaging, these techniques do not come with recipes for lifting them into 3D. In this paper, we address this shortcoming by proposing a Latent Consistency Model (LCM) adaptation for the task at hand. We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively. Based on the Objaverse dataset samples study, our 3D painting method attains strong preference in all evaluations. Source code is available at https://github.com/kongdai123/consistency2.
Paper Structure (13 sections, 1 equation, 4 figures, 1 table)

This paper contains 13 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Selected painting results of Objaverse deitke2023objaverse meshes using Consistency$^2$. Our method paints detailed high-resolution textures with very few denoising diffusion steps and allows for free camera pose selection for view painting.
  • Figure 2: We project a mesh textured with latent noise using various texture interpolation methods. The naive bilinear interpolation (left) disturbs the latent probability distribution and gives poor results. Our variance-preserving noise rendering (middle) results in a similar denoised quality of the mesh compared to just using a noise latent without any projection as a reference (right). VarInit denotes the variance of the initial projected noise.
  • Figure 3: We show a comparison between Texfusion's Sequential Interlaced Multi-view Sampler (SIMS, left) cao2023texfusion and our method (right) in accommodating high-resolution textures. Texfusion's SIMS has rigid requirements on texture resolution and completely loses view consistency with a high-resolution texture. Our formulation of separating noise and color textures enables high texture resolutions without sacrificing view consistency.
  • Figure 4: Qualitative comparison of Objaverse deitke2023objaverse mesh painting. Text2Tex's chen2023text2tex sequential nature makes it susceptible to irrecoverable artifacts such as seams and coherence issues. Our method is free of these limitations.