Table of Contents
Fetching ...

DragMesh: Interactive 3D Generation Made Easy

Tianshan Zhang, Zeyu Zhang, Hao Tang

TL;DR

DragMesh addresses the challenge of real-time, physically plausible 3D articulation by decoupling kinematic reasoning from motion generation. It introduces a lightweight Dual Quaternion VAE (DQ-VAE) guided by FiLM-conditioned priors and a Kinematics Prediction Network (KPP-Net) to infer joint type, axis, and origin, enabling annotation-free articulation on unseen meshes. The approach achieves real-time performance with strong generalization and a comprehensive loss design that enforces geometric fidelity and physical constraints, demonstrated through ablations and comparative studies. This work bridges static 3D generation and embodied simulation, offering a practical path toward generative 3D intelligence and interactive design tools.

Abstract

While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence. Code: https://github.com/AIGeeksGroup/DragMesh. Website: https://aigeeksgroup.github.io/DragMesh.

DragMesh: Interactive 3D Generation Made Easy

TL;DR

DragMesh addresses the challenge of real-time, physically plausible 3D articulation by decoupling kinematic reasoning from motion generation. It introduces a lightweight Dual Quaternion VAE (DQ-VAE) guided by FiLM-conditioned priors and a Kinematics Prediction Network (KPP-Net) to infer joint type, axis, and origin, enabling annotation-free articulation on unseen meshes. The approach achieves real-time performance with strong generalization and a comprehensive loss design that enforces geometric fidelity and physical constraints, demonstrated through ablations and comparative studies. This work bridges static 3D generation and embodied simulation, offering a practical path toward generative 3D intelligence and interactive design tools.

Abstract

While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence. Code: https://github.com/AIGeeksGroup/DragMesh. Website: https://aigeeksgroup.github.io/DragMesh.

Paper Structure

This paper contains 40 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Results by DragMesh: Our method translates intuitive drag-and-drop actions into accurate joint motion. It correctly infers and generates the motion of rotational joints (e.g., microwave, bucket, door) and translational joints (e.g., desk drawer, oven rack). For each object group, the images from left to right typically depict the initial state, an intermediate generated motion, and the final articulated state. Code: https://github.com/AIGeeksGroup/DragMesh Website: https://aigeeksgroup.github.io/DragMesh
  • Figure 2:
  • Figure 3: The DragMesh pipeline fuses point cloud, joint, and drag inputs through a VAE and Transformer architecture to predict a physically-corrected dual quaternion.
  • Figure 4: The DragMesh Annotation-Free Inference Pipeline. Given a raw mesh and drag, a segmentation model identifies the movable part while a VLM predicts the joint type. Our KPP-Net then regresses the precise axis and origin, enabling the final DragMesh model to generate the Dual Quaternion animation.
  • Figure 5: Qualitative Comparison. Our method (Ours) generates plausible interactions across all categories. Blank spaces for baselines (e.g., ArtGS, PartRm, DragApart) represent generation failures, where results are omitted due to unrecognizable outlines.