Table of Contents
Fetching ...

Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

Nian Liu, Libin Liu, Zilong Zhang, Zi Wang, Hongzhao Xie, Tengyu Liu, Xinyi Tong, Yaodong Yang, Zhaofeng He

TL;DR

This paper proposes a skill-conditioned controller that learns diverse skills with expressive variations that not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill.

Abstract

Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and shorter motions occupy even less space. Our empirical observations indicate this limits the representational capacity and diversity under each skill. An ideal latent space should be maximally packed by all motion's embedding clusters. In this paper, we propose a skill-conditioned controller that learns diverse skills with expressive variations. Our approach leverages the Neural Collapse phenomenon, a natural outcome of the classification-based encoder, to uniformly distributed cluster centers. We additionally propose a novel Embedding Expansion technique to form stylistic embedding clusters for diverse skills that are uniformly distributed on a hypersphere, maximizing the representational area occupied by each skill and minimizing unmapped regions. This maximally packed and uniformly distributed embedding space ensures that embeddings within the same cluster generate behaviors conforming to the characteristics of the corresponding motion clips, yet exhibiting noticeable variations within each cluster. Compared to existing methods, our controller not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill. Both qualitative and quantitative results confirm these traits, enabling our controller to be applied to a wide range of downstream tasks and serving as a cornerstone for diverse applications.

Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

TL;DR

This paper proposes a skill-conditioned controller that learns diverse skills with expressive variations that not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill.

Abstract

Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and shorter motions occupy even less space. Our empirical observations indicate this limits the representational capacity and diversity under each skill. An ideal latent space should be maximally packed by all motion's embedding clusters. In this paper, we propose a skill-conditioned controller that learns diverse skills with expressive variations. Our approach leverages the Neural Collapse phenomenon, a natural outcome of the classification-based encoder, to uniformly distributed cluster centers. We additionally propose a novel Embedding Expansion technique to form stylistic embedding clusters for diverse skills that are uniformly distributed on a hypersphere, maximizing the representational area occupied by each skill and minimizing unmapped regions. This maximally packed and uniformly distributed embedding space ensures that embeddings within the same cluster generate behaviors conforming to the characteristics of the corresponding motion clips, yet exhibiting noticeable variations within each cluster. Compared to existing methods, our controller not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill. Both qualitative and quantitative results confirm these traits, enabling our controller to be applied to a wide range of downstream tasks and serving as a cornerstone for diverse applications.

Paper Structure

This paper contains 20 sections, 7 equations, 9 figures.

Figures (9)

  • Figure 1: The visualization of two latent spaces using PCA. (a) The tight and biased embedding distributions for each motion in CALM, with labels indicating the motion names and their respective lengths. The spatial occupation of each skill correlates with the length of the motion clip. (b) The learned skill embedding clusters in our model form a maximally packed, uniform, and length-agnostic distribution.
  • Figure 2: Our method uses a unit hypersphere as the embedding space to feature uniformly distributed embedding clusters for each skill. We first employ a classification-based encoder to distribute motion features uniformly on a high-dimensional sphere, then apply conditional imitation learning with the Embedding Expansion technique to form a stylistic skill embedding cluster for each skill, achieving a maximally packed and uniformly distributed space.
  • Figure 3: Our controller is capable of generating smooth, natural motions.
  • Figure 4: The reconstruction scores achieved by CALM, c-ASE, and ours for the Sword&Shield dataset and MSA dataset. The horizontal axis represents the calculated reconstruction scores, while the vertical axis indicates the number of frames that achieved these scores.
  • Figure 5: Trajectories of the characters' root position collected from CALM and our model. Plots in each sub-figure, from left to right, are our model conditioned on mapped motion's feature mean, our model conditioned on embeddings from an embedding cluster of associated motion, the CALM model conditioned on learned motion embeddings, and the c-ASE model conditioned on desired skill label and random skill embedding.
  • ...and 4 more figures