Table of Contents
Fetching ...

HEIR: Learning Graph-Based Motion Hierarchies

Cheng Zheng, William Koch, Baiang Li, Felix Heide

TL;DR

HEIR addresses the challenge of learning interpretable motion hierarchies without hand-crafted primitives by representing observed motions with a learnable DAG $H$ over motion elements and decomposing $\Delta^t$ into parent-inherited and residual components. It learns a proximity-based candidate graph $G_0$ and uses a graph neural encoder to predict edge weights and local dynamics, followed by differentiable sampling of $H$ and a decoder that reconstructs $\hat{\Delta}^t$, with extensions to rotations via polar components and to 3D scene editing using ARAP-based deformation of Gaussian splats. Experiments on 1D toy hierarchies, a rotational planetary dataset, and dynamic Gaussian splatting scenes demonstrate accurate hierarchy recovery, improved perceptual realism, and coherent edits compared to baselines. The approach provides a general, data-driven framework for motion modeling that adapts to diverse tasks while offering interpretable structure for downstream control and analysis.

Abstract

Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/

HEIR: Learning Graph-Based Motion Hierarchies

TL;DR

HEIR addresses the challenge of learning interpretable motion hierarchies without hand-crafted primitives by representing observed motions with a learnable DAG over motion elements and decomposing into parent-inherited and residual components. It learns a proximity-based candidate graph and uses a graph neural encoder to predict edge weights and local dynamics, followed by differentiable sampling of and a decoder that reconstructs , with extensions to rotations via polar components and to 3D scene editing using ARAP-based deformation of Gaussian splats. Experiments on 1D toy hierarchies, a rotational planetary dataset, and dynamic Gaussian splatting scenes demonstrate accurate hierarchy recovery, improved perceptual realism, and coherent edits compared to baselines. The approach provides a general, data-driven framework for motion modeling that adapts to diverse tasks while offering interpretable structure for downstream control and analysis.

Abstract

Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/

Paper Structure

This paper contains 14 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Learning Motion Hierarchies. Given a sequence of observed positions $\mathbf{X}^t$ over time (left), we predict absolute motions and candidate graphs based on local spatial proximity. A graph neural network processes this structure to predict edge weights to infer a probabilistic parent-child hierarchy over motion elements (bottom path). The encoder computes the prediction of the relative motion based on these weighted parent candidates. The absolute motion of each motion element is then recursively aggregated from its parent using a residual composition process (top path) and a hierarchy matrix sampled from the edge weights using Gumbel-Softmax. We learn the hierarchy by minimizing the difference between the observed and predicted absolute motions across all time steps.
  • Figure 2: Learning of Hierarchical Relations in a 1D Trajectory. We evaluate the proposed hierarchical learning method for a 1D motion trajectory where individual nodes are moving in a hierarchical manner (see Ground Truth motion hierarchy in bottom left inset), but each adding its own unknown motion. Top left to bottom right: (1) raw node positions $X_t$ of the hierarchical trajectories over time, (2) absolute node velocities $\Delta_t$, (3) reconstructed hierarchy from inferred relationships with ground-truth hierarchy in the inset, and (4) relative velocities $\delta_t$ with respect to each node parent, given the reconstructed hierarchy (3). We find that the method is able to correctly identify all motions (bottom left) with the two core motions through the orange and green nodes.
  • Figure 3: Learning of hierarchical relations in a planetary system. We evaluate on a synthetic dataset with rotational hierarchies, a simplified synthetic planetary dataset. From left to right: (1) illustration of the pairwise metrics used for regularization between two timesteps; for clarity, only a subset of possible parent-child relations is shown. Solid arrows indicate potential parent-child vectors, with the color corresponding to the parent candidate. (2) Learned edge weights, where entries with a green border correspond to correct reconstructions. (3) The observed data shown with the reconstructed hierarchy; we note that the "moons" correctly inherit motion from their "planets".
  • Figure 4: Qualitative Evaluation of Gaussian Scene Deformation on the D-NeRF pumarola2021d dataset. We evaluate the method for hierarchical relationship learning on Gaussian splitting scenes, with thousands of nodes. Specifically, we show scene deformation for the "Excavator", "Hook", "Jumpingjacks", and "Warrior" scenes from the D-NeRF huang2024sc dataset. The arrows show the user-defined deformation on the faded original scene in two different scenarios. We overlay the resulting deformed scenes for the proposed method and SC-GS huang2024sc on the original scene . The proposed method produces more realistic and physically coherent deformations, preserving structural rigidity, while SC-GS introduces unnatural distortions and misaligned body geometry.