Table of Contents
Fetching ...

LATO: 3D Mesh Flow Matching with Structured TOpology Preserving LAtents

Tianhao Zhao, Youjia Zhang, Hang Long, Jinshen Zhang, Wenbing Li, Yang Yang, Gongbo Zhang, Jozef Hladký, Matthias Nießner, Wei Yang

TL;DR

LATO represents a mesh as a Vertex Displacement Field anchored on surface, incorporating a sparse voxel Variational Autoencoder to compress this explicit signal into a structured, topology-aware voxel latent.

Abstract

In this paper, we introduce LATO, a novel topology-preserving latent representation that enables scalable, flow matching-based synthesis of explicit 3D meshes. LATO represents a mesh as a Vertex Displacement Field (VDF) anchored on surface, incorporating a sparse voxel Variational Autoencoder (VAE) to compress this explicit signal into a structured, topology-aware voxel latent. To decapsulate the mesh, the VAE decoder progressively subdivides and prunes latent voxels to instantiate precise vertex locations. In the end, a dedicated connection head queries the voxel latent to predict edge connectivity between vertex pairs directly, allowing mesh topology to be recovered without isosurface extraction or heuristic meshing. For generative modeling, LATO adopts a two-stage flow matching process, first synthesizing the structure voxels and subsequently refining the voxel-wise topology features. Compared to prior isosurface/triangle-based diffusion models and autoregressive generation approaches, LATO generates meshes with complex geometry, well-formed topology while being highly efficient in inference.

LATO: 3D Mesh Flow Matching with Structured TOpology Preserving LAtents

TL;DR

LATO represents a mesh as a Vertex Displacement Field anchored on surface, incorporating a sparse voxel Variational Autoencoder to compress this explicit signal into a structured, topology-aware voxel latent.

Abstract

In this paper, we introduce LATO, a novel topology-preserving latent representation that enables scalable, flow matching-based synthesis of explicit 3D meshes. LATO represents a mesh as a Vertex Displacement Field (VDF) anchored on surface, incorporating a sparse voxel Variational Autoencoder (VAE) to compress this explicit signal into a structured, topology-aware voxel latent. To decapsulate the mesh, the VAE decoder progressively subdivides and prunes latent voxels to instantiate precise vertex locations. In the end, a dedicated connection head queries the voxel latent to predict edge connectivity between vertex pairs directly, allowing mesh topology to be recovered without isosurface extraction or heuristic meshing. For generative modeling, LATO adopts a two-stage flow matching process, first synthesizing the structure voxels and subsequently refining the voxel-wise topology features. Compared to prior isosurface/triangle-based diffusion models and autoregressive generation approaches, LATO generates meshes with complex geometry, well-formed topology while being highly efficient in inference.
Paper Structure (14 sections, 8 equations, 15 figures, 4 tables)

This paper contains 14 sections, 8 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: LATO vs. Existing Paradigms. Mainstream topology-agnostic approaches utilize vecset or voxel latents decoded into implicit fields (e.g., SDF), relying on Marching Cubes for mesh extraction. Conversely, explicit mesh generation methods adopt per-face latents via autoregressive or diffusion models, but suffer from severe memory bottlenecks. LATO proposes T-Voxels latents to explicitly model topology, enabling the direct generation of artist-friendly meshes.
  • Figure 2: Overview of the LATO pipeline. We explicitly encode mesh topology by sampling surface points infused with relative displacement to their enclosing face vertices (Vertex Displacement Field, VDF). These dense features are aggregated and compressed via a sparse voxel VAE into a structured latent representation, termed T-Voxels. To reconstruct the mesh, the T-Voxels undergo hierarchical subdivision and learnable pruning to precisely instantiate high-resolution vertex locations. Simultaneously, a connection head predicts edge existence between vertex pairs, directly recovering the explicit mesh topology.
  • Figure 3: Geometry conditioned generation comparison. We compare our method against state-of-the-art baselines. Existing methods result in either incomplete reconstructions or excessively dense and irregular topology. In contrast, our LATO generates hole-free meshes with well-formed topology suitable for downstream applications.
  • Figure 4: Topology comparison with implicit foundation models. We qualitatively compare against industrial-scale implicit baselines. While these foundation models possess scale and training resource advantages, their reliance on implicit field yields dense, irregular triangulation. In contrast, LATO generates artist-friendly edge flows
  • Figure 5: Image to 3D generation results.
  • ...and 10 more figures