Table of Contents
Fetching ...

MotionAnymesh: Physics-Grounded Articulation for Simulation-Ready Digital Twins

WenBo Xu, Liu Liu, Li Zhang, Dan Guo, RuoNan Liu

Abstract

Converting static 3D meshes into interactable articulated assets is crucial for embodied AI and robotic simulation. However, existing zero-shot pipelines struggle with complex assets due to a critical lack of physical grounding. Specifically, ungrounded Vision-Language Models (VLMs) frequently suffer from kinematic hallucinations, while unconstrained joint estimation inevitably leads to catastrophic mesh inter-penetration during physical simulation. To bridge this gap, we propose MotionAnymesh, an automated zero-shot framework that seamlessly transforms unstructured static meshes into simulation-ready digital twins. Our method features a kinematic-aware part segmentation module that grounds VLM reasoning with explicit SP4D physical priors, effectively eradicating kinematic hallucinations. Furthermore, we introduce a geometry-physics joint estimation pipeline that combines robust type-aware initialization with physics-constrained trajectory optimization to rigorously guarantee collision-free articulation. Extensive experiments demonstrate that MotionAnymesh significantly outperforms state-of-the-art baselines in both geometric precision and dynamic physical executability, providing highly reliable assets for downstream applications.

MotionAnymesh: Physics-Grounded Articulation for Simulation-Ready Digital Twins

Abstract

Converting static 3D meshes into interactable articulated assets is crucial for embodied AI and robotic simulation. However, existing zero-shot pipelines struggle with complex assets due to a critical lack of physical grounding. Specifically, ungrounded Vision-Language Models (VLMs) frequently suffer from kinematic hallucinations, while unconstrained joint estimation inevitably leads to catastrophic mesh inter-penetration during physical simulation. To bridge this gap, we propose MotionAnymesh, an automated zero-shot framework that seamlessly transforms unstructured static meshes into simulation-ready digital twins. Our method features a kinematic-aware part segmentation module that grounds VLM reasoning with explicit SP4D physical priors, effectively eradicating kinematic hallucinations. Furthermore, we introduce a geometry-physics joint estimation pipeline that combines robust type-aware initialization with physics-constrained trajectory optimization to rigorously guarantee collision-free articulation. Extensive experiments demonstrate that MotionAnymesh significantly outperforms state-of-the-art baselines in both geometric precision and dynamic physical executability, providing highly reliable assets for downstream applications.
Paper Structure (14 sections, 2 equations, 5 figures, 3 tables)

This paper contains 14 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: MotionAnymesh: Physics-Grounded Articulation of Static 3D Assets. While current SOTAs (e.g., Articulate-AnyMesh) rely on ungrounded semantics and often suffer from severe inter-penetration, MotionAnymesh ensures physically plausible articulation. Through kinematic-aware perception and physics-constrained optimization, our zero-shot framework transforms static meshes into collision-free, simulation-ready URDF digital twins for direct deployment in embodied AI tasks.
  • Figure 2: Overview of the MotionAnymesh framework. Our pipeline consists of three integrated stages: (1) Kinematic-Aware Part Segmentation, which extracts 3D-native primitives and clusters them using SP4D kinematic priors and VLM reasoning; (2) Joint Estimation and Optimization, featuring type-aware geometric initialization and physics-constrained trajectory refinement to ensure collision-free articulation ; and (3) Simulation-Ready Asset Finalization, providing simulation-ready URDF models with preserved textures.
  • Figure 3: Qualitative comparison of articulated object modeling. Compared with Articulate-Anything and Articulate-AnyMesh, MotionAnymesh produces cleaner geometric boundaries and more accurate kinematic structures, especially for complex mechanical objects like the robot arm and multi-part lamps.
  • Figure 4: Versatile articulation across diverse domains. MotionAnymesh successfully processes handcrafted assets and AI-generated surface meshes, transforming unstructured geometry into interactive digital twins without manual intervention.
  • Figure 5: Real-to-Sim-to-Real application. (Left) A static mesh is reconstructed from a single image. (Middle) MotionAnymesh generates a collision-free URDF for policy learning in simulation. (Right) The learned manipulation policy is deployed onto a physical robot, validating the sim-to-real fidelity of our estimated parameters.