Single Mesh Diffusion Models with Field Latents for Texture Generation

Thomas W. Mitchel; Carlos Esteves; Ameesh Makadia

Single Mesh Diffusion Models with Field Latents for Texture Generation

Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia

TL;DR

This work develops intrinsic diffusion models that operate directly on 3D mesh surfaces to synthesize high-fidelity textures from a single example. It introduces Field Latents (FLs), tangent-vector texture representations at mesh vertices, and Field Latent Diffusion Models (FLDMs) that denoise diffusion processes in FL space using isometry-equivariant field convolutions. The approach achieves state-of-the-art fidelity for single-textured-mesh texture generation and enables user-controlled editing, label-guided generation, inpainting, and generative texture transfer across similar geometries. By ensuring isometry-equivariance, the method supports seamless texture transfer between locally similar regions and across different topologies, with practical implications for texture synthesis in low-data 3D scenarios and content creation pipelines.

Abstract

We introduce a framework for intrinsic latent diffusion models operating directly on the surfaces of 3D shapes, with the goal of synthesizing high-quality textures. Our approach is underpinned by two contributions: field latents, a latent representation encoding textures as discrete vector fields on the mesh vertices, and field latent diffusion models, which learn to denoise a diffusion process in the learned latent space on the surface. We consider a single-textured-mesh paradigm, where our models are trained to generate variations of a given texture on a mesh. We show the synthesized textures are of superior fidelity compared those from existing single-textured-mesh generative models. Our models can also be adapted for user-controlled editing tasks such as inpainting and label-guided generation. The efficacy of our approach is due in part to the equivariance of our proposed framework under isometries, allowing our models to seamlessly reproduce details across locally similar regions and opening the door to a notion of generative texture transfer.

Single Mesh Diffusion Models with Field Latents for Texture Generation

TL;DR

Abstract

Paper Structure (37 sections, 45 equations, 11 figures, 4 tables)

This paper contains 37 sections, 45 equations, 11 figures, 4 tables.

Introduction
Related Work
Method Overview
Importance of Isometry-Equivariance
Field Latents
FL-VAE Encoder
Equivariance
Field Latent Diffusion Models
Equivariance
Denoising with Field Convolutions
Experiments
Texture Compression and Reconstruction
Unconditional (Label-Free) Generation
Label-Guided Generation
Inpainting
...and 22 more sections

Figures (11)

Figure 1: Our latent diffusion models operate directly on the surfaces of 3D shapes, synthesizing new high-quality textures (center) after training on a single example (left). Both our novel latent representation and diffusion models are isometry-equivariant, facilitating a notion of generative texture transfer by sampling pre-trained models on new geometries (right). Above our models are conditioned on coarse semantic labels reflecting a subjective distribution of content, which delineate the sole and interior of the shoes, the eyes and mouths of the skulls, and the decals on the bottles.
Figure 2: Visual comparison of textures compressed and reconstructed with the FL-VAE and INF koestler2022intrinsic on $30$K vertex meshes. Compared to barycentric coordinates, our proposed logarithmic coordinate function more richly extends latent features across the mesh, enabling the reconstruction of finer details. Zoom in to view.
Figure 3: Visual comparison of unconditionally (label-free) generated textures from FLDMs and Sin3DM wu2023sin3dm. Our FLDM's isometry-equivariant construction allows for replication of textural details across locally similar regions. In contrast, Sin3DM's extrinsic approach associates textural features with 3D space; Synthesized details appear as repetitions or extrusions of previous patterns along the major axes of the mesh, and we observe that novel textures cannot be created without modifying geometry. Zoom in to compare.
Figure 4: Label-guided generation. FLDMs can be conditioned on a subjective user-input labeling which generated textures will reflect. See Figure \ref{['teaser_fig']} for more examples. Zoom in to view.
Figure 5: Inpainting with FLDMs. The input texture is preserved in the masked regions and new content is synthesized elsewhere with agreement on the boundaries. Zoom in to view.
...and 6 more figures

Theorems & Definitions (4)

Claim 1
proof
Claim 2
proof

Single Mesh Diffusion Models with Field Latents for Texture Generation

TL;DR

Abstract

Single Mesh Diffusion Models with Field Latents for Texture Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (4)