Table of Contents
Fetching ...

LaDy: Lagrangian-Dynamic Informed Network for Skeleton-based Action Segmentation via Spatial-Temporal Modulation

Haoyu Ji, Xueting Liu, Yu Gao, Wenze Huang, Zhihao Yang, Weihong Ren, Zhiyong Wang, Honghai Liu

Abstract

Skeleton-based Temporal Action Segmentation (STAS) aims to densely parse untrimmed skeletal sequences into frame-level action categories. However, existing methods, while proficient at capturing spatio-temporal kinematics, neglect the underlying physical dynamics that govern human motion. This oversight limits inter-class discriminability between actions with similar kinematics but distinct dynamic intents, and hinders precise boundary localization where dynamic force profiles shift. To address these, we propose the Lagrangian-Dynamic Informed Network (LaDy), a framework integrating principles of Lagrangian dynamics into the segmentation process. Specifically, LaDy first computes generalized coordinates from joint positions and then estimates Lagrangian terms under physical constraints to explicitly synthesize the generalized forces. To further ensure physical coherence, our Energy Consistency Loss enforces the work-energy theorem, aligning kinetic energy change with the work done by the net force. The learned dynamics then drive a Spatio-Temporal Modulation module: Spatially, generalized forces are fused with spatial representations to provide more discriminative semantics. Temporally, salient dynamic signals are constructed for temporal gating, thereby significantly enhancing boundary awareness. Experiments on challenging datasets show that LaDy achieves state-of-the-art performance, validating the integration of physical dynamics for action segmentation. Code is available at https://github.com/HaoyuJi/LaDy.

LaDy: Lagrangian-Dynamic Informed Network for Skeleton-based Action Segmentation via Spatial-Temporal Modulation

Abstract

Skeleton-based Temporal Action Segmentation (STAS) aims to densely parse untrimmed skeletal sequences into frame-level action categories. However, existing methods, while proficient at capturing spatio-temporal kinematics, neglect the underlying physical dynamics that govern human motion. This oversight limits inter-class discriminability between actions with similar kinematics but distinct dynamic intents, and hinders precise boundary localization where dynamic force profiles shift. To address these, we propose the Lagrangian-Dynamic Informed Network (LaDy), a framework integrating principles of Lagrangian dynamics into the segmentation process. Specifically, LaDy first computes generalized coordinates from joint positions and then estimates Lagrangian terms under physical constraints to explicitly synthesize the generalized forces. To further ensure physical coherence, our Energy Consistency Loss enforces the work-energy theorem, aligning kinetic energy change with the work done by the net force. The learned dynamics then drive a Spatio-Temporal Modulation module: Spatially, generalized forces are fused with spatial representations to provide more discriminative semantics. Temporally, salient dynamic signals are constructed for temporal gating, thereby significantly enhancing boundary awareness. Experiments on challenging datasets show that LaDy achieves state-of-the-art performance, validating the integration of physical dynamics for action segmentation. Code is available at https://github.com/HaoyuJi/LaDy.
Paper Structure (49 sections, 3 theorems, 48 equations, 11 figures, 20 tables)

This paper contains 49 sections, 3 theorems, 48 equations, 11 figures, 20 tables.

Key Result

Theorem 1

The change in a system's kinetic energy, $\Delta E_K$, over a time interval $[t_1, t_2]$ is equal to the total work, $W$, done on the system by the net generalized forces $\bm{\tau}_{net}$ during that interval. where $P_{net}(t) = \dot{\bm{q}}(t)^T \bm{\tau}_{net}(t)$ is the instantaneous power.

Figures (11)

  • Figure 1: Conceptual comparison of STAS frameworks. (a) Conventional models capture only spatio-temporal kinematic patterns. (b) Our LaDy framework introduces Lagrangian dynamics to derive generalized forces that modulate the spatio-temporal features, leading to improved discriminability and boundary localization.
  • Figure 2: Overview of the LaDy framework. The Lagrangian Dynamic Model (lower-left) synthesizes generalized forces from coordinates, constrained by an Energy Consistency Loss for physical coherence. Concurrently, the Spatial Model (upper-left) extracts kinematic features (GCNs) and fuses them with these physics-aware dynamics. The multi-stage Temporal Model (right) processes the fused spatial representation, where each stage is hierarchically gated by force-driven signals to enhance boundary awareness.
  • Figure 3: Physics-constrained dynamics synthesis and energy-based supervision. Generalized states ($q, \dot{q}, \ddot{q}$) are fed into physics-constrained estimators to derive the Lagrangian terms ($M, C, G, F$). These terms are then assembled via the Lagrangian equation to synthesize the generalized forces ($\tau$). Concurrently, the Energy Consistency Loss ($\mathcal{L}_{EC}$) regularizes the forces.
  • Figure 4: Qualitative results on PKU-MMD v2 and MCFS-130. The top row is the Ground Truth, followed by the segmentation results (bars) and boundary confidence scores (curves) for LaDy, LaSA, and DeST. Different colors denote distinct action classes. Red boxes and gray dashed vertical lines highlight misclassifications and larger boundary deviations observed in other methods compared to LaDy.
  • Figure 5: Visualization of the representation space on PKU-MMD v2. Each point represents an action segment feature, colored according to its ground-truth class. Quantitative clustering metrics are reported in the top-right corner: Silhouette Coefficient (SC $\uparrow$), Calinski-Harabasz Index (CH $\uparrow$), and Davies-Bouldin Index (DB $\downarrow$). Higher ($\uparrow$) is better for SC/CH; lower ($\downarrow$) is better for DB.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Definition 1: The Lagrangian
  • Theorem 1: The Work-Energy Theorem
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Definition 2: Forward Kinematics