Table of Contents
Fetching ...

Human Motion Unlearning

Edoardo De Matteis, Matteo Migliarini, Alessio Sampieri, Indro Spinelli, Fabio Galasso

TL;DR

The paper formalizes Human Motion Unlearning (HMU) with a focus on violence unlearning to prevent harmful 3D motion synthesis. It introduces a violence-based benchmark built from HumanML3D and Motion-X, defines forget/retain subsets, and tailors evaluation metrics for sequential data, including MM-Safe and implicit-prompt testing. Adapting training-free methods UCE and RECE to text-to-motion and proposing Latent Code Replacement (LCR), the study demonstrates that targeted latent-space interventions can suppress violent content while preserving motion realism, with LCR offering the best safety-realism trade-off. The work provides a foundational framework for safe motion generation and general unlearning in temporal generative models, with broad implications for robotics, animation, and embodied agents.

Abstract

We introduce Human Motion Unlearning and motivate it through the concrete task of preventing violent 3D motion synthesis, an important safety requirement given that popular text-to-motion datasets (HumanML3D and Motion-X) contain from 7\% to 15\% violent sequences spanning both atomic gestures (e.g., a single punch) and highly compositional actions (e.g., loading and swinging a leg to kick). By focusing on violence unlearning, we demonstrate how removing a challenging, multifaceted concept can serve as a proxy for the broader capability of motion "forgetting." To enable systematic evaluation of Human Motion Unlearning, we establish the first motion unlearning benchmark by automatically filtering HumanML3D and Motion-X datasets to create distinct forget sets (violent motions) and retain sets (safe motions). We introduce evaluation metrics tailored to sequential unlearning, measuring both suppression efficacy and the preservation of realism and smooth transitions. We adapt two state-of-the-art, training-free image unlearning methods (UCE and RECE) to leading text-to-motion architectures (MoMask and BAMM), and propose Latent Code Replacement (LCR), a novel, training-free approach that identifies violent codes in a discrete codebook representation and substitutes them with safe alternatives. Our experiments show that unlearning violent motions is indeed feasible and that acting on latent codes strikes the best trade-off between violence suppression and preserving overall motion quality. This work establishes a foundation for advancing safe motion synthesis across diverse applications. Website: https://www.pinlab.org/hmu.

Human Motion Unlearning

TL;DR

The paper formalizes Human Motion Unlearning (HMU) with a focus on violence unlearning to prevent harmful 3D motion synthesis. It introduces a violence-based benchmark built from HumanML3D and Motion-X, defines forget/retain subsets, and tailors evaluation metrics for sequential data, including MM-Safe and implicit-prompt testing. Adapting training-free methods UCE and RECE to text-to-motion and proposing Latent Code Replacement (LCR), the study demonstrates that targeted latent-space interventions can suppress violent content while preserving motion realism, with LCR offering the best safety-realism trade-off. The work provides a foundational framework for safe motion generation and general unlearning in temporal generative models, with broad implications for robotics, animation, and embodied agents.

Abstract

We introduce Human Motion Unlearning and motivate it through the concrete task of preventing violent 3D motion synthesis, an important safety requirement given that popular text-to-motion datasets (HumanML3D and Motion-X) contain from 7\% to 15\% violent sequences spanning both atomic gestures (e.g., a single punch) and highly compositional actions (e.g., loading and swinging a leg to kick). By focusing on violence unlearning, we demonstrate how removing a challenging, multifaceted concept can serve as a proxy for the broader capability of motion "forgetting." To enable systematic evaluation of Human Motion Unlearning, we establish the first motion unlearning benchmark by automatically filtering HumanML3D and Motion-X datasets to create distinct forget sets (violent motions) and retain sets (safe motions). We introduce evaluation metrics tailored to sequential unlearning, measuring both suppression efficacy and the preservation of realism and smooth transitions. We adapt two state-of-the-art, training-free image unlearning methods (UCE and RECE) to leading text-to-motion architectures (MoMask and BAMM), and propose Latent Code Replacement (LCR), a novel, training-free approach that identifies violent codes in a discrete codebook representation and substitutes them with safe alternatives. Our experiments show that unlearning violent motions is indeed feasible and that acting on latent codes strikes the best trade-off between violence suppression and preserving overall motion quality. This work establishes a foundation for advancing safe motion synthesis across diverse applications. Website: https://www.pinlab.org/hmu.

Paper Structure

This paper contains 53 sections, 7 equations, 7 figures, 16 tables, 1 algorithm.

Figures (7)

  • Figure 1: The text-to-motion model takes an input prompt and generates the corresponding motion. With unlearning, when violent content is prompted, the model avoids generating harmful actions, producing a safe and appropriate outcome.
  • Figure 2: Analysis of violent actions in HumanML3D and Motion-X. (Top row) Pie charts represent the proportion of harmful actions within each dataset. (Bottom row) Bar plots break down the occurrence of individual violent actions.
  • Figure 3: Explicit vs. implicit prompting of violent actions.
  • Figure 4: Illustration of motion unlearning approaches: (1) Fine-tuning modifies both the text encoder and motion decoder to remove violent actions, (2) UCE and RECE, as training-free methods, operate solely on the text encoder, (3) Our proposed LCR selectively updates only the affected codebook entries, ensuring targeted unlearning with minimal impact on overall synthesis quality.
  • Figure 5: Qualitative comparison across datasets: HumanML3D and Motion-X samples demonstrating unlearning effectiveness. See videos on the project website.
  • ...and 2 more figures