Table of Contents
Fetching ...

Utilizing Motion Matching with Deep Reinforcement Learning for Target Location Tasks

Jeongmin Lee, Taesoo Kwon, Hyunju Shin, Yoonsang Lee

TL;DR

The paper addresses long-horizon target-location control for virtual characters by coupling motion matching with deep reinforcement learning to directly generate motion-matching queries. It treats each RL step as a motion-matching-and-playback cycle, with state $s_t = \{c_t, g_t\}$ and action $a_t = \{t_t\}$, where motion-matching features $f_i = \{c_i, t_i\} \in \mathbb{R}^{27}$ drive the next frame selection and a target location $\mathbf{g}_t$ guides progress, enabling efficient learning without full-body motion synthesis. A novel hit reward term $r_t = \exp(-\mathrm{dist}(s_t)) + \exp(-\mathrm{hits}(a_t))$ and a 10-stage obstacle curriculum (plus optional obstacle-sensing inputs) are introduced to improve learning in moving-obstacle environments, promoting safer trajectories within about $1$ second of lookahead. Experiments show policies can reach target locations with limited training time (e.g., as little as $0.2$ seconds per step) and ultimately require thousands to millions of steps ($\sim14\mathrm{M}$) to converge in complex scenes, highlighting the approach’s practicality for rapid animation development and interactive applications, while acknowledging memory and exploration speed constraints that future work may address with autoencoder-based feature learning and related techniques.

Abstract

We present an approach using deep reinforcement learning (DRL) to directly generate motion matching queries for long-term tasks, particularly targeting the reaching of specific locations. By integrating motion matching and DRL, our method demonstrates the rapid learning of policies for target location tasks within minutes on a standard desktop, employing a simple reward design. Additionally, we propose a unique hit reward and obstacle curriculum scheme to enhance policy learning in environments with moving obstacles.

Utilizing Motion Matching with Deep Reinforcement Learning for Target Location Tasks

TL;DR

The paper addresses long-horizon target-location control for virtual characters by coupling motion matching with deep reinforcement learning to directly generate motion-matching queries. It treats each RL step as a motion-matching-and-playback cycle, with state and action , where motion-matching features drive the next frame selection and a target location guides progress, enabling efficient learning without full-body motion synthesis. A novel hit reward term and a 10-stage obstacle curriculum (plus optional obstacle-sensing inputs) are introduced to improve learning in moving-obstacle environments, promoting safer trajectories within about second of lookahead. Experiments show policies can reach target locations with limited training time (e.g., as little as seconds per step) and ultimately require thousands to millions of steps () to converge in complex scenes, highlighting the approach’s practicality for rapid animation development and interactive applications, while acknowledging memory and exploration speed constraints that future work may address with autoencoder-based feature learning and related techniques.

Abstract

We present an approach using deep reinforcement learning (DRL) to directly generate motion matching queries for long-term tasks, particularly targeting the reaching of specific locations. By integrating motion matching and DRL, our method demonstrates the rapid learning of policies for target location tasks within minutes on a standard desktop, employing a simple reward design. Additionally, we propose a unique hit reward and obstacle curriculum scheme to enhance policy learning in environments with moving obstacles.
Paper Structure (7 sections, 6 equations, 6 figures)

This paper contains 7 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: Examples of the hit reward. Left: The action (red arrows) results in a hit reward of $\mathrm{exp}(0)$ with no future positions in the obstacle. Right: The action leads to a hit reward of $\mathrm{exp}(-1)$ due to one future position inside the obstacle.
  • Figure 2: Example of the obstacle map.
  • Figure 3: Our policy networks for Plain Environment (left) and Moving Obstacles Environment (right).
  • Figure 4: Learning curves for the plain environment. The blue, red, green, and purple vertical dashed lines in (a) and (b) correspond to the policy's performance at approximately 20k, 100k, 533k, and 1M steps, corresponding to 30, 150, 800, and 1500 seconds of training time. The character's trajectory for each policy is illustrated in (c) using the corresponding color. (d) and (e) depict the movement styles of policies trained for 100k and 1M steps, respectively.
  • Figure 5: Learning curves for the moving obstacles environment in the early stages (before 3M steps). The dashed lines signify the points at which the curriculum stage transitions to the next stage.
  • ...and 1 more figures