Table of Contents
Fetching ...

RACon: Retrieval-Augmented Simulated Character Locomotion Control

Yuxuan Mu, Shihao Zou, Kangning Yin, Zheng Tian, Li Cheng, Weinan Zhang, Jun Wang

TL;DR

RACon: Retrieval-Augmented Simulated Character Locomotion Control surpasses existing techniques in both quality and quantity in locomotion control, by switching extensive databases for retrieval, and it can adapt to distinctive motion types at run time.

Abstract

In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes a retriever and a motion controller. The retriever searches motion experts from a user-specified database in a task-oriented fashion, which boosts the responsiveness to the user's control. The selected motion experts and the manipulation signal are then transferred to the controller to drive the simulated character. In addition, a retrieval-augmented discriminator is designed to stabilize the training process. Our method surpasses existing techniques in both quality and quantity in locomotion control, as demonstrated in our empirical study. Moreover, by switching extensive databases for retrieval, it can adapt to distinctive motion types at run time.

RACon: Retrieval-Augmented Simulated Character Locomotion Control

TL;DR

RACon: Retrieval-Augmented Simulated Character Locomotion Control surpasses existing techniques in both quality and quantity in locomotion control, by switching extensive databases for retrieval, and it can adapt to distinctive motion types at run time.

Abstract

In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes a retriever and a motion controller. The retriever searches motion experts from a user-specified database in a task-oriented fashion, which boosts the responsiveness to the user's control. The selected motion experts and the manipulation signal are then transferred to the controller to drive the simulated character. In addition, a retrieval-augmented discriminator is designed to stabilize the training process. Our method surpasses existing techniques in both quality and quantity in locomotion control, as demonstrated in our empirical study. Moreover, by switching extensive databases for retrieval, it can adapt to distinctive motion types at run time.

Paper Structure

This paper contains 10 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Example of switch between two distinct motion types, cartwheel and zombie walk, as the response of user's switching databases. The retrieved reference motion is shown in the bottom right. The color of the character marks the style-distinctive expert retrieval database being used at test time. Brown: Cartwheel. Yellow: Zombie. Note that zombie is a new motion type added by the user at test time which is not involved in training. Our results show a natural transition whenever the retrieval database changes from one to another.
  • Figure 2: The pipeline of our proposed RACon HRL System. The forward process is shown in solid arrows, and the state feedback process is shown in dot arrows. Our system incorporates two frameworks, $\langle \pi^{\text{retr}}, \text{Env}^{\text{retr}}\rangle$ and $\langle \pi^{\text{ctrl}}, \text{Env}^{\text{phy}}\rangle$, working in the 'manager-worker' fashion. These two frameworks share the same goal signal $g$ and feedback state $s_{t+1}$ and are trained in an end-to-end manner with policy-level rewards (i.e.$\tilde{r}^{\text{g}}_{t}$, $r^{\text{ref}}_{t}$) and system-level rewards (i.e.$r^{\text{g}}_{t}$, $r^{\text{prior}}_{t}$). The optimization details of the retriever and the controller are elaborated in \ref{['sec:MER']} and \ref{['sec:ReferenceGuided']} respectively. In addition, the yellow dash arrows mark the workflow of retrieval-augmented motion discriminator described in \ref{['sec:motionprior']} which provides a robust motion prior reward for both policies.
  • Figure 3: The RL framework of Task-Oriented Learnable Retrieval. Our retrieval environment can hold a set of databases $\{\mathcal{M}^{\text{retr}}_k\}$ of different motion types for the user to choose at run time. Given the action $a_t^{\text{retr}}$ as a query $Q$, the environment searches for the most similar key $K$ to index the most suitable value $V$, i.e. a motion clip $\tilde{s}_{t+1}$, from the database. When the $\boldsymbol{\text{flag}^{\text{retr}}}$ is activated, the motion clip will be stitched to the character by transforming it to align the initial frame's root coordinate to the current character to build $s^\text{retr}_{t+1}$. Otherwise, the $\boldsymbol{\text{flag}^{\text{retr}}}$ is deactivated where the character will continuously step along the previous retrieval state $s^\text{retr}_{t \to t+1}$.
  • Figure 4: The architecture of TOLR. The non-differentiable feature extractor extracts and computes features from motion clips or current state. The pipeline at the bottom is the actual retrieval policy built by linear layers.
  • Figure 5: Example results of locomotion control with skills of Cartwheel and Common Locomotion shown in 5 fps. The gamepad control signal is marked with a blue stick. The color of the simulated character marks different retrieval databases $\{\mathcal{M}^\text{retr}_k|k\}$ chosen by users at run time. Light Grey: Common Locomotion. Brown: Cartwheel.
  • ...and 2 more figures