Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

Zhen Zhou; Jian Liu; Biwen Lei; Jing Xu; Haohan Weng; Yiling Zhu; Zhuo Chen; Junfeng Fan; Yunkai Ma; Dazhao Du; Song Guo; Fengshui Jing; Chunchao Guo

Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

Zhen Zhou, Jian Liu, Biwen Lei, Jing Xu, Haohan Weng, Yiling Zhu, Zhuo Chen, Junfeng Fan, Yunkai Ma, Dazhao Du, Song Guo, Fengshui Jing, Chunchao Guo

TL;DR

This work designs the first asynchronous online RL framework tailored for 3D mesh generation post-training efficiency improvement, and proposes Advantage-guided Ranking Preference Optimization (ARPO), a novel RL algorithm that achieves a better trade-off between training efficiency and generalization than current RL algorithms designed for 3D mesh generation.

Abstract

Reinforcement learning (RL) has demonstrated remarkable success in text and image generation, yet its potential in 3D generation remains largely unexplored. Existing attempts typically rely on offline direct preference optimization (DPO) method, which suffers from low training efficiency and limited generalization. In this work, we aim to enhance both the training efficiency and generation quality of RL in 3D mesh generation. Specifically, (1) we design the first asynchronous online RL framework tailored for 3D mesh generation post-training efficiency improvement, which is 3.75$\times$ faster than synchronous RL. (2) We propose Advantage-guided Ranking Preference Optimization (ARPO), a novel RL algorithm that achieves a better trade-off between training efficiency and generalization than current RL algorithms designed for 3D mesh generation, such as DPO and group relative policy optimization (GRPO). (3) Based on asynchronous ARPO, we propose Mesh-Pro, which additionally introduces a novel diagonal-aware mixed triangular-quadrilateral tokenization for mesh representation and a ray-based reward for geometric integrity. Mesh-Pro achieves state-of-the-art performance on artistic and dense meshes.

Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

TL;DR

Abstract

faster than synchronous RL. (2) We propose Advantage-guided Ranking Preference Optimization (ARPO), a novel RL algorithm that achieves a better trade-off between training efficiency and generalization than current RL algorithms designed for 3D mesh generation, such as DPO and group relative policy optimization (GRPO). (3) Based on asynchronous ARPO, we propose Mesh-Pro, which additionally introduces a novel diagonal-aware mixed triangular-quadrilateral tokenization for mesh representation and a ray-based reward for geometric integrity. Mesh-Pro achieves state-of-the-art performance on artistic and dense meshes.

Paper Structure (44 sections, 18 equations, 17 figures, 8 tables, 6 algorithms)

This paper contains 44 sections, 18 equations, 17 figures, 8 tables, 6 algorithms.

Introduction
Related Work
3D Mesh Generation
Reinforcement Learning in Mesh Generation
Mesh-Pro
Mesh Generation Pre-Training
Asynchronous Online RL Framework
ARPO
Reward Design
Experiments
Dataset
Implementation Details
Baselines and Evaluation Metrics
Qualitative Results
Quantitative Results
...and 29 more sections

Figures (17)

Figure 1: Mesh-Pro generates artist-style quadrilateral-dominated meshes with diversity, high fidelity, and topological quality.
Figure 2: Architecture Overview. Mesh-Pro begins by sampling point clouds from the input dense and artist meshes. The features from the point cloud encoder are then passed to an auto-regressive Hourglass Transformer hao2024meshtron for mesh decoding. This decoder is trained with truncation to output triangle-quad tokens. The pre-training objective is to reconstruct the input mesh. Subsequently, asynchronous ARPO is used for RL post-training to generate high-quality, well-structured meshes, guided by ray and topological rewards.
Figure 3: Diagonal-Aware Mesh Tokenization. "P" denotes vertex tokens. The minimum vertex always appears first in each face (i.e., lower coordinates). Triangles use padding tokens ("S") at the end, while quads encode diagonal information in the fourth vertex via offset $flag \times 2^{n_{bits}}$ ($flag \in \{0,1,2\}$ for Diagonals 1, 2, 3). The three edges of the first triangle (in green) can all potentially serve as diagonals. This defers the triangle-vs-quad decision to the last position, reducing prediction pressure.
Figure 4: Asynchronous Online RL Framework.
Figure 5: Qualitative comparison of Mesh-Pro with other methods.
...and 12 more figures

Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

TL;DR

Abstract

Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (17)