Table of Contents
Fetching ...

LaFiTe: A Generative Latent Field for 3D Native Texturing

Chia-Hao Chen, Zi-Xin Zou, Yan-Pei Cao, Ze Yuan, Guan Luo, Xiaojuan Qi, Ding Liang, Song-Hai Zhang, Yuan-Chen Guo

TL;DR

3D-native texturing remains limited by topology- and UV-dependent representations. LaFiTe introduces a sparse latent color field learned with a VAE from densely sampled colored point clouds, plus a geometry latent derived from monochrome input to condition generation. A conditional rectified-flow model then synthesizes high-quality textures that are coherent across complex geometries and can be baked into UV maps for rendering, while enabling PBR materials and local refinement. Results show >10 dB PSNR gains over prior native approaches and strong performance against multi-view projection baselines, highlighting the practical potential for scalable 3D content creation workflows.

Abstract

Generating high-fidelity, seamless textures directly on 3D surfaces, what we term 3D-native texturing, remains a fundamental open challenge, with the potential to overcome long-standing limitations of UV-based and multi-view projection methods. However, existing native approaches are constrained by the absence of a powerful and versatile latent representation, which severely limits the fidelity and generality of their generated textures. We identify this representation gap as the principal barrier to further progress. We introduce LaFiTe, a framework that addresses this challenge by learning to generate textures as a 3D generative sparse latent color field. At its core, LaFiTe employs a variational autoencoder (VAE) to encode complex surface appearance into a sparse, structured latent space, which is subsequently decoded into a continuous color field. This representation achieves unprecedented fidelity, exceeding state-of-the-art methods by >10 dB PSNR in reconstruction, by effectively disentangling texture appearance from mesh topology and UV parameterization. Building upon this strong representation, a conditional rectified-flow model synthesizes high-quality, coherent textures across diverse styles and geometries. Extensive experiments demonstrate that LaFiTe not only sets a new benchmark for 3D-native texturing but also enables flexible downstream applications such as material synthesis and texture super-resolution, paving the way for the next generation of 3D content creation workflows.

LaFiTe: A Generative Latent Field for 3D Native Texturing

TL;DR

3D-native texturing remains limited by topology- and UV-dependent representations. LaFiTe introduces a sparse latent color field learned with a VAE from densely sampled colored point clouds, plus a geometry latent derived from monochrome input to condition generation. A conditional rectified-flow model then synthesizes high-quality textures that are coherent across complex geometries and can be baked into UV maps for rendering, while enabling PBR materials and local refinement. Results show >10 dB PSNR gains over prior native approaches and strong performance against multi-view projection baselines, highlighting the practical potential for scalable 3D content creation workflows.

Abstract

Generating high-fidelity, seamless textures directly on 3D surfaces, what we term 3D-native texturing, remains a fundamental open challenge, with the potential to overcome long-standing limitations of UV-based and multi-view projection methods. However, existing native approaches are constrained by the absence of a powerful and versatile latent representation, which severely limits the fidelity and generality of their generated textures. We identify this representation gap as the principal barrier to further progress. We introduce LaFiTe, a framework that addresses this challenge by learning to generate textures as a 3D generative sparse latent color field. At its core, LaFiTe employs a variational autoencoder (VAE) to encode complex surface appearance into a sparse, structured latent space, which is subsequently decoded into a continuous color field. This representation achieves unprecedented fidelity, exceeding state-of-the-art methods by >10 dB PSNR in reconstruction, by effectively disentangling texture appearance from mesh topology and UV parameterization. Building upon this strong representation, a conditional rectified-flow model synthesizes high-quality, coherent textures across diverse styles and geometries. Extensive experiments demonstrate that LaFiTe not only sets a new benchmark for 3D-native texturing but also enables flexible downstream applications such as material synthesis and texture super-resolution, paving the way for the next generation of 3D content creation workflows.

Paper Structure

This paper contains 19 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A gallery of diverse 3D assets textured by our 3D-native framework, LaFiTe, demonstrating high-fidelity, seamless textures across a wide range of visual styles.
  • Figure 2: The LaFiTe Pipeline.(Top) A VAE autoencoder learns a sparse latent color field representation by encoding a colored point cloud sampled from the mesh surface. (Bottom) For generation, the VAE encoder first extracts a geometry latent from a monochrome point cloud. This geometry latent then conditions a rectified flow model to synthesize a texture latent, which is decoded to the 3D texture.
  • Figure 3: Robustness to self-occlusion. LaFiTe's 3D-native formulation generates complete and coherent textures even in highly occluded regions where projection methods fail.
  • Figure 4: Visual comparison of VAE reconstruction quality. Our method reconstructs sharper, more detailed textures, avoiding the blurs and artifacts of the TRELLIS baseline.
  • Figure 5: Qualitative comparison of image-conditioned 3D texture generation. Textures generated by LaFiTe are more faithfully aligned to both the reference image and the given geometry, and are free from seams or inconsistencies.
  • ...and 3 more figures