Table of Contents
Fetching ...

LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

Heechang Kim, Gwanghyun Kim, Se Young Chun

TL;DR

LaMoGen addresses the challenge of fine-grained expressive control in text-to-motion generation by integrating Laban Movement Analysis (LMA) with diffusion-based synthesis. It introduces a zero-shot, inference-time guidance mechanism that differentiably models LMA features and updates the text embedding during DDIM sampling to align generated motions with target Laban Effort and Shape tags, while preserving the motion's identity. The method comprises differentiable Laban feature extraction, a two-step generation pipeline (baseline then Laban-guided refinement), and a relative Laban loss that steers conditioning during sampling. Quantitative and qualitative results show improved controllability and disentanglement of expressive attributes, with only modest trade-offs in text-motion alignment, demonstrating practical potential for expressive motion synthesis without additional training data.

Abstract

Diverse human motion generation is an increasingly important task, having various applications in computer vision, human-computer interaction and animation. While text-to-motion synthesis using diffusion models has shown success in generating high-quality motions, achieving fine-grained expressive motion control remains a significant challenge. This is due to the lack of motion style diversity in datasets and the difficulty of expressing quantitative characteristics in natural language. Laban movement analysis has been widely used by dance experts to express the details of motion including motion quality as consistent as possible. Inspired by that, this work aims for interpretable and expressive control of human motion generation by seamlessly integrating the quantification methods of Laban Effort and Shape components into the text-guided motion generation models. Our proposed zero-shot, inference-time optimization method guides the motion generation model to have desired Laban Effort and Shape components without any additional motion data by updating the text embedding of pretrained diffusion models during the sampling step. We demonstrate that our approach yields diverse expressive motion qualities while preserving motion identity by successfully manipulating motion attributes according to target Laban tags.

LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

TL;DR

LaMoGen addresses the challenge of fine-grained expressive control in text-to-motion generation by integrating Laban Movement Analysis (LMA) with diffusion-based synthesis. It introduces a zero-shot, inference-time guidance mechanism that differentiably models LMA features and updates the text embedding during DDIM sampling to align generated motions with target Laban Effort and Shape tags, while preserving the motion's identity. The method comprises differentiable Laban feature extraction, a two-step generation pipeline (baseline then Laban-guided refinement), and a relative Laban loss that steers conditioning during sampling. Quantitative and qualitative results show improved controllability and disentanglement of expressive attributes, with only modest trade-offs in text-motion alignment, demonstrating practical potential for expressive motion synthesis without additional training data.

Abstract

Diverse human motion generation is an increasingly important task, having various applications in computer vision, human-computer interaction and animation. While text-to-motion synthesis using diffusion models has shown success in generating high-quality motions, achieving fine-grained expressive motion control remains a significant challenge. This is due to the lack of motion style diversity in datasets and the difficulty of expressing quantitative characteristics in natural language. Laban movement analysis has been widely used by dance experts to express the details of motion including motion quality as consistent as possible. Inspired by that, this work aims for interpretable and expressive control of human motion generation by seamlessly integrating the quantification methods of Laban Effort and Shape components into the text-guided motion generation models. Our proposed zero-shot, inference-time optimization method guides the motion generation model to have desired Laban Effort and Shape components without any additional motion data by updating the text embedding of pretrained diffusion models during the sampling step. We demonstrate that our approach yields diverse expressive motion qualities while preserving motion identity by successfully manipulating motion attributes according to target Laban tags.

Paper Structure

This paper contains 39 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Existing text to motion generation methods zhang2022motiondiffuse cannot control the extents of motion by prompt, while our method can control them. In the third frame marked with arrows, existing methods (blue arrow) generates same motion, while our method (red arrow) generate diverse motion with increasing energy.
  • Figure 2: The overall pipeline of our proposed method. Our framework first generates a text-conditioned baseline motion to establish content-aware target features. It then performs a second, guided generation pass, iteratively optimizing the text embedding to match the desired Laban characteristics. The differentiable motion feature extraction process is detailed at the bottom.
  • Figure 3: Qualitative comparison of our method against the prompt editing baseline. For each attribute (Shape, Weight, Time, Flow), we show the motion generated from a neutral prompt (left), the result of prompt editing (middle), and the result from our proposed method (right). Our method produces distinct and controllable expressive variations that align with the target Laban components (red arrow), whereas the baseline shows minimal or incorrect changes (blue arrow). For Shape, the "Far" prompt results in a motion where the arms are spread wider and the head is bowed more deeply. Other descriptions are listed in the main text.
  • Figure 4: Interpolation of motion expressiveness by varying the guidance scale for Shape and Weight attributes. Our method allows for smooth and continuous control over the intensity of the desired Laban component.
  • Figure 5: Kinematics of the root joint over time, with and without Gaussian smoothing applied before differentiation. The plots for acceleration and jerk (third, fourth row) highlight the necessity of the smoothing process.
  • ...and 4 more figures