Table of Contents
Fetching ...

Versatile Physics-based Character Control with Hybrid Latent Representation

Jinseok Bae, Jungdam Won, Donggeun Lim, Inwoo Hwang, Young Min Kim

TL;DR

The paper tackles enabling versatile, physics-based character control under temporally or spatially sparse goals by introducing a hybrid latent representation that combines a discrete motion prior with a continuous residual. It employs Residual Vector Quantization to maximize code usage and sampling efficiency, while high-level policies adjust only small residuals, improving stability and training efficiency. The authors demonstrate three downstream tasks—motion in-betweening, head-mounted device tracking, and point-goal navigation—with superior motion fidelity, naturalness, and robustness compared to continuous or discrete baselines, supported by ablations on RVQ and codebook counts. This approach offers a scalable framework for reusable motion priors in complex control settings and points to future extensions to multi-subgoal scenarios and model-based planning strategies.

Abstract

We present a versatile latent representation that enables physically simulated character to efficiently utilize motion priors. To build a powerful motion embedding that is shared across multiple tasks, the physics controller should employ rich latent space that is easily explored and capable of generating high-quality motion. We propose integrating continuous and discrete latent representations to build a versatile motion prior that can be adapted to a wide range of challenging control tasks. Specifically, we build a discrete latent model to capture distinctive posterior distribution without collapse, and simultaneously augment the sampled vector with the continuous residuals to generate high-quality, smooth motion without jittering. We further incorporate Residual Vector Quantization, which not only maximizes the capacity of the discrete motion prior, but also efficiently abstracts the action space during the task learning phase. We demonstrate that our agent can produce diverse yet smooth motions simply by traversing the learned motion prior through unconditional motion generation. Furthermore, our model robustly satisfies sparse goal conditions with highly expressive natural motions, including head-mounted device tracking and motion in-betweening at irregular intervals, which could not be achieved with existing latent representations.

Versatile Physics-based Character Control with Hybrid Latent Representation

TL;DR

The paper tackles enabling versatile, physics-based character control under temporally or spatially sparse goals by introducing a hybrid latent representation that combines a discrete motion prior with a continuous residual. It employs Residual Vector Quantization to maximize code usage and sampling efficiency, while high-level policies adjust only small residuals, improving stability and training efficiency. The authors demonstrate three downstream tasks—motion in-betweening, head-mounted device tracking, and point-goal navigation—with superior motion fidelity, naturalness, and robustness compared to continuous or discrete baselines, supported by ablations on RVQ and codebook counts. This approach offers a scalable framework for reusable motion priors in complex control settings and points to future extensions to multi-subgoal scenarios and model-based planning strategies.

Abstract

We present a versatile latent representation that enables physically simulated character to efficiently utilize motion priors. To build a powerful motion embedding that is shared across multiple tasks, the physics controller should employ rich latent space that is easily explored and capable of generating high-quality motion. We propose integrating continuous and discrete latent representations to build a versatile motion prior that can be adapted to a wide range of challenging control tasks. Specifically, we build a discrete latent model to capture distinctive posterior distribution without collapse, and simultaneously augment the sampled vector with the continuous residuals to generate high-quality, smooth motion without jittering. We further incorporate Residual Vector Quantization, which not only maximizes the capacity of the discrete motion prior, but also efficiently abstracts the action space during the task learning phase. We demonstrate that our agent can produce diverse yet smooth motions simply by traversing the learned motion prior through unconditional motion generation. Furthermore, our model robustly satisfies sparse goal conditions with highly expressive natural motions, including head-mounted device tracking and motion in-betweening at irregular intervals, which could not be achieved with existing latent representations.

Paper Structure

This paper contains 15 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 6: Results of motion in-betweening example. Our simulated character (blue) generates plausible trajectories given zero-velocity keyframes (red). Arrows and numbers indicate the temporal flow of each episode.
  • Figure 7: Qualitative comparison between continuous and hybrid models on head-mounted device tracking example. Our agent faithfully adheres to the learned motion prior while continuous model cannot.
  • Figure 8: Qualitative comparison between discrete$^+$ and hybrid models on head-mounted device tracking example. While discrete$^+$ model sometimes violates natural gaits, our model consistently exhibits realistic locomotion.
  • Figure 9: Qualitative comparison between baselines (continuous, discrete) and hybrid model on the point-goal navigation example. Continuous model prioritizes speed over motion quality to maximize the task reward, which is higher when the goal is reached quickly. On the other hand, discrete model produces plausible motion when reaching the target location but struggles to maintain natural idle motion once it arrives.
  • Figure 10: Zero-shot adaptation to unexpected perturbations in a point-goal navigation task. Every 0.3 seconds, obstacles (red spheres) are thrown at the agent from random directions and velocities. Red boxes highlight the key moment of the collision that triggers the largest perturbation in an episode. Our agent consistently maintains natural movement despite significant disruptions.
  • ...and 2 more figures