Table of Contents
Fetching ...

MeshRipple: Structured Autoregressive Generation of Artist-Meshes

Junkai Lin, Hang Long, Huipeng Guo, Jielei Zhang, JiaYi Yang, Tianle Guo, Yang Yang, Jianwen Li, Wenxiao Zhang, Matthias Nießner, Wei Yang

TL;DR

MeshRipple tackles topology-context mismatch in autoregressive mesh generation by introducing Ripple Tokenization, which uses a frontier-aware BFS to keep the next-face context near the sequence tail. The method couples a structured transformer with Frontier Attention and Native Sparse Contextual Attention to access long-range cues while maintaining tractable memory, and employs an expansive prediction strategy that jointly selects the next face and the next frontier root. This combination yields improved topological fidelity and surface completeness, outperforming recent baselines on artist meshes and remaining competitive on dense meshes with lower compute. The approach enables scalable, artist-friendly mesh synthesis and paves the way for further improvements in topology control and multi-component scene generation.

Abstract

Meshes serve as a primary representation for 3D assets. Autoregressive mesh generators serialize faces into sequences and train on truncated segments with sliding-window inference to cope with memory limits. However, this mismatch breaks long-range geometric dependencies, producing holes and fragmented components. To address this critical limitation, we introduce MeshRipple, which expands a mesh outward from an active generation frontier, akin to a ripple on a surface. MeshRipple rests on three key innovations: a frontier-aware BFS tokenization that aligns the generation order with surface topology; an expansive prediction strategy that maintains coherent, connected surface growth; and a sparse-attention global memory that provides an effectively unbounded receptive field to resolve long-range topological dependencies. This integrated design enables MeshRipple to generate meshes with high surface fidelity and topological completeness, outperforming strong recent baselines.

MeshRipple: Structured Autoregressive Generation of Artist-Meshes

TL;DR

MeshRipple tackles topology-context mismatch in autoregressive mesh generation by introducing Ripple Tokenization, which uses a frontier-aware BFS to keep the next-face context near the sequence tail. The method couples a structured transformer with Frontier Attention and Native Sparse Contextual Attention to access long-range cues while maintaining tractable memory, and employs an expansive prediction strategy that jointly selects the next face and the next frontier root. This combination yields improved topological fidelity and surface completeness, outperforming recent baselines on artist meshes and remaining competitive on dense meshes with lower compute. The approach enables scalable, artist-friendly mesh synthesis and paves the way for further improvements in topology control and multi-component scene generation.

Abstract

Meshes serve as a primary representation for 3D assets. Autoregressive mesh generators serialize faces into sequences and train on truncated segments with sliding-window inference to cope with memory limits. However, this mismatch breaks long-range geometric dependencies, producing holes and fragmented components. To address this critical limitation, we introduce MeshRipple, which expands a mesh outward from an active generation frontier, akin to a ripple on a surface. MeshRipple rests on three key innovations: a frontier-aware BFS tokenization that aligns the generation order with surface topology; an expansive prediction strategy that maintains coherent, connected surface growth; and a sparse-attention global memory that provides an effectively unbounded receptive field to resolve long-range topological dependencies. This integrated design enables MeshRipple to generate meshes with high surface fidelity and topological completeness, outperforming strong recent baselines.

Paper Structure

This paper contains 23 sections, 3 equations, 16 figures, 7 tables, 1 algorithm.

Figures (16)

  • Figure 1: Gallery of results generated by MeshRipple. Our method handles diverse mesh styles and topologies, producing artist-style meshes with optimized topology (airplane, train, totem) as well as dense smooth-surface meshes (bears, unicorn, dinosaur), all with coherent geometry and high-fidelity details.
  • Figure 2: Overview of MeshRipple. The input mesh is first serialized into a token sequence via Ripple Tokenization, which is then truncated into fixed-length segments as input to a structured autoregressive model. The model employs hourglass layers at both ends to convert between vertex and face tokens. The core consists of a stack of $2\times N$ identical blocks, each comprising a Frontier-Attention layer, a self-attention layer, a cross-attention layer for point-cloud conditioning (omitted for clarity), and a Native Sparse Contextual Attention layer that attends to the full mesh sequence under a causal mask. The middle hidden states are additionally fed into a lightweight head to predict the next root face to expand.
  • Figure 3: Illustration of Ripple Tokenization. At each step, the current root face expands following the counterclockwise half-edge order. Faces that still have unvisited neighbors remain in the FIFO frontier queue; once all neighbors are visited, the face is popped from the queue.
  • Figure 4: Qualitative comparison of point cloud-conditioned generation between MeshRipple and baselines. Baselines inevitably produce broken surfaces and holes, whereas MeshRipple yields more complete and coherent geometries.
  • Figure 5: Image-conditioned generation results. Our method can generate high-fidelity meshes aligned with the input images.
  • ...and 11 more figures