Human Motion Unlearning
Edoardo De Matteis, Matteo Migliarini, Alessio Sampieri, Indro Spinelli, Fabio Galasso
TL;DR
The paper formalizes Human Motion Unlearning (HMU) with a focus on violence unlearning to prevent harmful 3D motion synthesis. It introduces a violence-based benchmark built from HumanML3D and Motion-X, defines forget/retain subsets, and tailors evaluation metrics for sequential data, including MM-Safe and implicit-prompt testing. Adapting training-free methods UCE and RECE to text-to-motion and proposing Latent Code Replacement (LCR), the study demonstrates that targeted latent-space interventions can suppress violent content while preserving motion realism, with LCR offering the best safety-realism trade-off. The work provides a foundational framework for safe motion generation and general unlearning in temporal generative models, with broad implications for robotics, animation, and embodied agents.
Abstract
We introduce Human Motion Unlearning and motivate it through the concrete task of preventing violent 3D motion synthesis, an important safety requirement given that popular text-to-motion datasets (HumanML3D and Motion-X) contain from 7\% to 15\% violent sequences spanning both atomic gestures (e.g., a single punch) and highly compositional actions (e.g., loading and swinging a leg to kick). By focusing on violence unlearning, we demonstrate how removing a challenging, multifaceted concept can serve as a proxy for the broader capability of motion "forgetting." To enable systematic evaluation of Human Motion Unlearning, we establish the first motion unlearning benchmark by automatically filtering HumanML3D and Motion-X datasets to create distinct forget sets (violent motions) and retain sets (safe motions). We introduce evaluation metrics tailored to sequential unlearning, measuring both suppression efficacy and the preservation of realism and smooth transitions. We adapt two state-of-the-art, training-free image unlearning methods (UCE and RECE) to leading text-to-motion architectures (MoMask and BAMM), and propose Latent Code Replacement (LCR), a novel, training-free approach that identifies violent codes in a discrete codebook representation and substitutes them with safe alternatives. Our experiments show that unlearning violent motions is indeed feasible and that acting on latent codes strikes the best trade-off between violence suppression and preserving overall motion quality. This work establishes a foundation for advancing safe motion synthesis across diverse applications. Website: https://www.pinlab.org/hmu.
