Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning

Sangwoo Shin; Kunzhao Ren; Xiaobin Xiong; Josiah Hanna

Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning

Sangwoo Shin, Kunzhao Ren, Xiaobin Xiong, Josiah Hanna

Abstract

Recent work in reinforcement learning has shown that incorporating structural priors for articulated robots, such as link connectivity, into policy networks improves learning efficiency. However, dynamics properties, despite their fundamental role in determining how forces and motion propagate through the body, remain largely underexplored as an inductive bias for policy learning. To address this gap, we present the Articulated-Body Dynamics Network (ABD-Net), a novel graph neural network architecture grounded in the computational structure of forward dynamics. Specifically, we adapt the inertia propagation mechanism from the Articulated Body Algorithm, systematically aggregating inertial quantities from child to parent links in a tree-structured manner, while replacing physical quantities with learnable parameters. Embedding ABD-NET into the policy actor enables dynamics-informed representations that capture how actions propagate through the body, leading to efficient and robust policy learning. Through experiments with simulated humanoid, quadruped, and hopper robots, our approach demonstrates increased sample efficiency and generalization to dynamics shifts compared to transformer-based and GNN baselines. We further validate the learned policy on real Unitree G1 and Go2 robots, state-of-the-art humanoid and quadruped platforms, generating dynamic, versatile and robust locomotion behaviors through sim-to-real transfer with real-time inference.

Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning

Abstract

Paper Structure (16 sections, 9 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Related Works
Exploiting Body Structure for Policy Learning
Physics-Informed Reinforcement Learning
Preliminaries
Reinforcement Learning
Robot Morphology and Graph Neural Networks
Forward Dynamics
Our Approach
Link-wise Observation Encoding
Dynamics-Informed Message Passing
Action Decoding
Experiments
Simulation Experiments
Hardware Experiments
...and 1 more sections

Figures (10)

Figure 1: Different approaches to computing node features for articulated robots. Conventional methods use link connectivity to define information flow via adjacency or attention mask, leaving the network to learn how to form node features from scratch. Our approach encodes the computational structure of forward dynamics by performing dynamics-inspired message passing, where learnable inertia-related quantities (denoted as $I^a$) are propagated and aggregated from children to parents to form node features.
Figure 2: Overview of $\textsc{ABD-Net}$ on a quadruped robot. (a) Observation Encoding: Each link $i$ has its own projection $\phi_i$ that transforms the observation $\mathbf{s}$ into a link-wise observation embedding $\mathbf{z}_i$ (e.g., FR-Thigh, FR-Calf, FR-Foot for the front-right leg). (b) Dynamics-Informed Message Passing: Each link $i$ is associated with learnable parameters $(\mathbf{W}_i, \mathbf{B}_i)$. Link $j$ computes a contribution $\mathbf{v}^a_j$ using $\mathbf{W}_j$ and sends it to its parent, which aggregates contributions from all children to form its link representation $\mathbf{v}_i$ ($\sigma$ denotes softplus). (c) Action Decoding: For joint $j$ connecting $\textsc{pa}(j)$ and link $j$, the action $\mathbf{a}_j$ is computed from the parent's representation $\mathbf{v}_{\textsc{pa}(j)}$.
Figure 3: Learning curves comparing ABD-Net (ours) and baselines: mean return versus the number of environment steps. All methods are trained with 5 seeds. Shaded regions indicate 95% standard confidence intervals.
Figure 4: Recovery behavior under 2$\times$ mass on Hopper Hop. Top: $\textsc{ABD-Net}$ applies stronger torque to recover from the downward tilt. Bottom: SWAT fails to compensate and falls.
Figure 5: Learned link representations on Go2 during trot gait. Left: Feature norm time series of hip joints (FL: front-left, RR: rear-right, etc.). Right: Correlation rank matrix between hip joint features, where lower rank indicates higher correlation.
...and 5 more figures

Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning

Abstract

Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning

Authors

Abstract

Table of Contents

Figures (10)