HEIR: Learning Graph-Based Motion Hierarchies
Cheng Zheng, William Koch, Baiang Li, Felix Heide
TL;DR
HEIR addresses the challenge of learning interpretable motion hierarchies without hand-crafted primitives by representing observed motions with a learnable DAG $H$ over motion elements and decomposing $\Delta^t$ into parent-inherited and residual components. It learns a proximity-based candidate graph $G_0$ and uses a graph neural encoder to predict edge weights and local dynamics, followed by differentiable sampling of $H$ and a decoder that reconstructs $\hat{\Delta}^t$, with extensions to rotations via polar components and to 3D scene editing using ARAP-based deformation of Gaussian splats. Experiments on 1D toy hierarchies, a rotational planetary dataset, and dynamic Gaussian splatting scenes demonstrate accurate hierarchy recovery, improved perceptual realism, and coherent edits compared to baselines. The approach provides a general, data-driven framework for motion modeling that adapts to diverse tasks while offering interpretable structure for downstream control and analysis.
Abstract
Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/
