Table of Contents
Fetching ...

LingoMotion: An Interpretable and Unambiguous Symbolic Representation for Human Motion

Yao Zhang, Zhuchenyang Liu, Yu Xiao

Abstract

Existing representations for human motion, such as MotionGPT, often operate as black-box latent vectors with limited interpretability and build on joint positions which can cause ambiguity. Inspired by the hierarchical structure of natural languages - from letters to words, phrases, and sentences - we propose LingoMotion, a motion language that facilitates interpretable and unambiguous symbolic representation for both simple and complex human motion. In this paper, we introduce the concept design of LingoMotion, including the definitions of motion alphabet based on joint angles, the morphology for forming words and phrases to describe simple actions like walking and their attributes like speed and scale, as well as the syntax for describing more complex human activities with sequences of words and phrases. The preliminary results, including the implementation and evaluation of motion alphabet using a large-scale motion dataset Motion-X, demonstrate the high fidelity of motion representation.

LingoMotion: An Interpretable and Unambiguous Symbolic Representation for Human Motion

Abstract

Existing representations for human motion, such as MotionGPT, often operate as black-box latent vectors with limited interpretability and build on joint positions which can cause ambiguity. Inspired by the hierarchical structure of natural languages - from letters to words, phrases, and sentences - we propose LingoMotion, a motion language that facilitates interpretable and unambiguous symbolic representation for both simple and complex human motion. In this paper, we introduce the concept design of LingoMotion, including the definitions of motion alphabet based on joint angles, the morphology for forming words and phrases to describe simple actions like walking and their attributes like speed and scale, as well as the syntax for describing more complex human activities with sequences of words and phrases. The preliminary results, including the implementation and evaluation of motion alphabet using a large-scale motion dataset Motion-X, demonstrate the high fidelity of motion representation.
Paper Structure (10 sections, 3 figures, 1 table)

This paper contains 10 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The core analogy of our paper: Just as natural language decomposes continuous sound waves into discrete letters, words, phrases, and sentences, we propose a "motion language" that decomposes continuous joint-angle signals into discrete motion letters, words, phrases, and sentences to describe complex human activities. This provides a structured, interpretable, and compositional framework for human motion representation.
  • Figure 2: (a) Illustration of the 22 joints in the SMPL model, with joint angles calculated based on biological standards schlegel2024usingwu2002isb. (b) Joint angles of the right hip, which has 3 DoFs (flexion, adduction, rotation), reveal periodic hip movement during walking. (c) The 3D positions of the right knee during walking, which lose the periodic information.
  • Figure 3: Our 3-stage pipeline for discovering the motion alphabet from right hip flexion measurements. (a) Segment the signal at local extrema. (b) Learn a codebook of canonical shapes via K-Means clustering on normalized segments. (c) Represent the original signal as a sequence of discrete "motion letters" (shape s) and their continuous attributes ($\sigma, \mu, l$). (d) Visualization of the 3D motion, rendered from the joint position, illustrating the corresponding physical poses at the segment boundaries.