Table of Contents
Fetching ...

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

Jingbo Wang, Zhengyi Luo, Ye Yuan, Yixuan Li, Bo Dai

TL;DR

PACER+ addresses the need for diverse, controllable pedestrian animation in driving simulations by unifying physics-based action control with on-demand motion content. It advances prior trajectory-following methods by training a single policy to both follow a trajectory and imitate selective motion content via a per-joint spatial-temporal mask, guided by an Adversarial Motion Prior (AMP). The framework supports content from generative models, motion capture, and videos, enabling zero-shot recreation of real-world pedestrian motions with missing parts automatically infilled. It demonstrates improved motion quality and diversity on synthetic terrains and real driving scenarios, offering a practical, on-demand tool for enriching AV simulators with realistic, scenario-specific pedestrian behaviors.

Abstract

We address the challenge of content diversity and controllability in pedestrian simulation for driving scenarios. Recent pedestrian animation frameworks have a significant limitation wherein they primarily focus on either following trajectory [46] or the content of the reference video [57], consequently overlooking the potential diversity of human motion within such scenarios. This limitation restricts the ability to generate pedestrian behaviors that exhibit a wider range of variations and realistic motions and therefore restricts its usage to provide rich motion content for other components in the driving simulation system, e.g., suddenly changed motion to which the autonomous vehicle should respond. In our approach, we strive to surpass the limitation by showcasing diverse human motions obtained from various sources, such as generated human motions, in addition to following the given trajectory. The fundamental contribution of our framework lies in combining the motion tracking task with trajectory following, which enables the tracking of specific motion parts (e.g., upper body) while simultaneously following the given trajectory by a single policy. This way, we significantly enhance both the diversity of simulated human motion within the given scenario and the controllability of the content, including language-based control. Our framework facilitates the generation of a wide range of human motions, contributing to greater realism and adaptability in pedestrian simulations for driving scenarios. More information is on our project page https://wangjingbo1219.github.io/papers/CVPR2024_PACER_PLUS/PACERPLUSPage.html .

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

TL;DR

PACER+ addresses the need for diverse, controllable pedestrian animation in driving simulations by unifying physics-based action control with on-demand motion content. It advances prior trajectory-following methods by training a single policy to both follow a trajectory and imitate selective motion content via a per-joint spatial-temporal mask, guided by an Adversarial Motion Prior (AMP). The framework supports content from generative models, motion capture, and videos, enabling zero-shot recreation of real-world pedestrian motions with missing parts automatically infilled. It demonstrates improved motion quality and diversity on synthetic terrains and real driving scenarios, offering a practical, on-demand tool for enriching AV simulators with realistic, scenario-specific pedestrian behaviors.

Abstract

We address the challenge of content diversity and controllability in pedestrian simulation for driving scenarios. Recent pedestrian animation frameworks have a significant limitation wherein they primarily focus on either following trajectory [46] or the content of the reference video [57], consequently overlooking the potential diversity of human motion within such scenarios. This limitation restricts the ability to generate pedestrian behaviors that exhibit a wider range of variations and realistic motions and therefore restricts its usage to provide rich motion content for other components in the driving simulation system, e.g., suddenly changed motion to which the autonomous vehicle should respond. In our approach, we strive to surpass the limitation by showcasing diverse human motions obtained from various sources, such as generated human motions, in addition to following the given trajectory. The fundamental contribution of our framework lies in combining the motion tracking task with trajectory following, which enables the tracking of specific motion parts (e.g., upper body) while simultaneously following the given trajectory by a single policy. This way, we significantly enhance both the diversity of simulated human motion within the given scenario and the controllability of the content, including language-based control. Our framework facilitates the generation of a wide range of human motions, contributing to greater realism and adaptability in pedestrian simulations for driving scenarios. More information is on our project page https://wangjingbo1219.github.io/papers/CVPR2024_PACER_PLUS/PACERPLUSPage.html .
Paper Structure (39 sections, 7 figures, 5 tables)

This paper contains 39 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: We showcase the effectiveness of our proposed framework in synthetic and real-world driving scenarios. Our framework excels at generating physically realistic animations that adhere to provided trajectories while offering extensive control over the upper and full body movements. Additionally, our framework demonstrates the remarkable ability to recreate pedestrian animations with occlusions from real-world videos in a zero-shot manner. These inherent capabilities make our framework a robust and versatile approach foron-demand pedestrian animation in driving scenarios.
  • Figure 2: Framework of $\text{PACER+}$. Our framework follows the goal-conditioned reinforcement learning with Adversarial Motion Prior. To enable fine-grained control of specific body parts, we introduce an additional spatial-temporal mask to the motion-tracking task. This mask indicates the presence of a reference motion that the policy should track. By focusing on this tracking task, our framework enables the demonstration of diverse pedestrian behaviors at specific time steps and locations in a zero-shot manner.
  • Figure 3: Our framework presents an on-demand control system tailored for real-world videos. Beginning with the pre-processing in Wang_2023_ICCV, our policy network can track high-confidence motions and effectively fill in missing parts without additional fine-tuning. Moreover, our framework offers the novel functionality of introducing customized animations into real-world scenarios with flexible control options.
  • Figure 4: Results on manually synthetic terrains. Our framework enables the synthesis of animations by combining a given trajectory with motion content generated by language-based motion generation models tevet2023humanchen2023executing.
  • Figure 5: Zero-shot animation recreation of real-world pedestrians. Our framework is capable of simulating pedestrian animation following the motion content of real-world videos.
  • ...and 2 more figures