Table of Contents
Fetching ...

ArticFlow: Generative Simulation of Articulated Mechanisms

Jiong Lin, Jinchen Ruan, Hod Lipson

TL;DR

ArticFlow, a two-stage flow matching framework that learns a controllable velocity field from noise to target point sets under explicit action control, shows that action-conditioned flow matching is a practical route to controllable and high-quality articulated mechanism generation.

Abstract

Recent advances in generative models have produced strong results for static 3D shapes, whereas articulated 3D generation remains challenging due to action-dependent deformations and limited datasets. We introduce ArticFlow, a two-stage flow matching framework that learns a controllable velocity field from noise to target point sets under explicit action control. ArticFlow couples (i) a latent flow that transports noise to a shape-prior code and (ii) a point flow that transports points conditioned on the action and the shape prior, enabling a single model to represent diverse articulated categories and generalize across actions. On MuJoCo Menagerie, ArticFlow functions both as a generative model and as a neural simulator: it predicts action-conditioned kinematics from a compact prior and synthesizes novel morphologies via latent interpolation. Compared with object-specific simulators and an action-conditioned variant of static point-cloud generators, ArticFlow achieves higher kinematic accuracy and better shape quality. Results show that action-conditioned flow matching is a practical route to controllable and high-quality articulated mechanism generation.

ArticFlow: Generative Simulation of Articulated Mechanisms

TL;DR

ArticFlow, a two-stage flow matching framework that learns a controllable velocity field from noise to target point sets under explicit action control, shows that action-conditioned flow matching is a practical route to controllable and high-quality articulated mechanism generation.

Abstract

Recent advances in generative models have produced strong results for static 3D shapes, whereas articulated 3D generation remains challenging due to action-dependent deformations and limited datasets. We introduce ArticFlow, a two-stage flow matching framework that learns a controllable velocity field from noise to target point sets under explicit action control. ArticFlow couples (i) a latent flow that transports noise to a shape-prior code and (ii) a point flow that transports points conditioned on the action and the shape prior, enabling a single model to represent diverse articulated categories and generalize across actions. On MuJoCo Menagerie, ArticFlow functions both as a generative model and as a neural simulator: it predicts action-conditioned kinematics from a compact prior and synthesizes novel morphologies via latent interpolation. Compared with object-specific simulators and an action-conditioned variant of static point-cloud generators, ArticFlow achieves higher kinematic accuracy and better shape quality. Results show that action-conditioned flow matching is a practical route to controllable and high-quality articulated mechanism generation.

Paper Structure

This paper contains 15 sections, 8 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: ArticFlow models a category of diverse articulated objects with a single point-flow field, conditioned on shape prior and joint actions. In this figure, columns interpolate the shape-prior condition (novel instances); rows sweep the action control (joint angles). Examples include eyeglasses from PartNet-Mobilityxiang2020sapien and robot arms from MuJoCo Menagerietassa2018deepmind. The same field produces coherent deformations for one category of articulated shapes, enabling interpolation and controllable 3D generation.
  • Figure 2: Action-conditioned velocity fields. With the shape prior $Z_x$ fixed, changing the action embedding from $Z_{a_1}$ (left) to $Z_{a_2}$ (right) modulates the velocity fields $v(x,t \mid Z_a, Z_x)$. Each field transports states from the Gaussian prior ($x_0\sim\mathcal{N}(0, I)$) to an action-specific manifold $\mathcal{M}_{a}$ along the path $x_t$, producing distinct kinematic deformations. In our case, the target manifold consists of articulated rigid bodies represented as point clouds.
  • Figure 3: Two-stage conditioned flow matching.(A). For each training pair $(X_1, A)$, we encode the point cloud with PointNet qi2017pointnet to obtain the shape latent code $Z_x$ and encode the joint angles (Fourier layers tancik2020fourier + MLP) to obtain $Z_a$. We jointly optimize two flow-matching models: The point flow$u_\theta$ predicts a conditional velocity in point space that transports points from a prior $p_0$ to the target set $X_1$ using straight-line pairings (target point set velocity $X_1-X_0$). The concatenated condition $(Z_x \oplus Z_a)$, added with time embedding, is injected using FiLM perez2018film layers. The latent flow$v_\psi$ transports latent noise $y_0$ to the shape code $Z_x$, with target velocity $Z_x-y_0$. (B). At sampling time, We first draw latent noise $\hat{y}_{0}\sim q_{0}$ and an action set $(\hat{a}_1, \hat{a}_2)$. Integrating the latent ODE $\dot y_{t}=v_\psi$ forward from $t{=}0$ to $t{=}1$ yields a sampled shape latent code $\hat{Z}_{x}$. We then form the condition with $\hat{Z}_{x}$ and $\hat{Z}_{a}$, draw a point prior $\hat{X}_{0}\sim p_{0}$, and integrate the point ODE $\dot X_{t}=u_\theta$ to obtain the generated point cloud $\hat{X}$. This produces novel shapes with controllable articulation under arbitrary actions.
  • Figure 4: Action-conditioned flow: qualitative results. For a fixed object and varying joint actions, we compare the deformed point clouds predicted by VSM, ArticFlow-MLP, ArticFlow-PVCNN, and the ground truth. Conditioned only on the action latent, ArticFlow produces shapes whose articulation more closely matches the target kinematics than VSM.
  • Figure 5: Action and shape-prior conditioned flow, qualitative results. When conditioned on both the shape latent and the action latent, we compare our method (ArticFlow) with the action-conditioned PointFlow baseline. Columns vary the shape prior (different object instances) and rows sweep the action condition. ArticFlow produces higher-quality surfaces and kinematically valid deformations. For the PointFlow-Action baseline, simple 1-DoF objects show only minor deformation across angle conditions, and more complex kinematics lose fine shape details, such as the robot legs.
  • ...and 2 more figures