Table of Contents
Fetching ...

Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments

Mansur Arief, Mike Timmerman, Jiachen Li, David Isele, Mykel J Kochenderfer

TL;DR

This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions iteratively for navigating highly interactive driving scenarios, such as T-intersections or roundabouts.

Abstract

Training intelligent agents to navigate highly interactive environments presents significant challenges. While guided meta reinforcement learning (RL) approach that first trains a guiding policy to train the ego agent has proven effective in improving generalizability across scenarios with various levels of interaction, the state-of-the-art method tends to be overly sensitive to extreme cases, impairing the agents' performance in the more common scenarios. This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions iteratively for navigating highly interactive driving scenarios, such as T-intersections or roundabouts. Unlike traditional methods that may underrepresent critical interactions or overemphasize extreme cases during training, our approach strategically adjusts the training distribution towards more challenging driving behaviors using IS proposal distributions and applies the importance ratio to de-bias the result. By estimating a naturalistic distribution from real-world datasets and employing a mixture model for iterative training refinements, the framework ensures a balanced focus across common and extreme driving scenarios. Experiments conducted with both synthetic and naturalistic datasets demonstrate both accelerated training and performance improvements under highly interactive driving tasks.

Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments

TL;DR

This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions iteratively for navigating highly interactive driving scenarios, such as T-intersections or roundabouts.

Abstract

Training intelligent agents to navigate highly interactive environments presents significant challenges. While guided meta reinforcement learning (RL) approach that first trains a guiding policy to train the ego agent has proven effective in improving generalizability across scenarios with various levels of interaction, the state-of-the-art method tends to be overly sensitive to extreme cases, impairing the agents' performance in the more common scenarios. This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions iteratively for navigating highly interactive driving scenarios, such as T-intersections or roundabouts. Unlike traditional methods that may underrepresent critical interactions or overemphasize extreme cases during training, our approach strategically adjusts the training distribution towards more challenging driving behaviors using IS proposal distributions and applies the importance ratio to de-bias the result. By estimating a naturalistic distribution from real-world datasets and employing a mixture model for iterative training refinements, the framework ensures a balanced focus across common and extreme driving scenarios. Experiments conducted with both synthetic and naturalistic datasets demonstrate both accelerated training and performance improvements under highly interactive driving tasks.
Paper Structure (26 sections, 6 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 6 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The IS-guided meta RL training framework operates as follows. First, we train a meta RL policy that captures the diverse behaviors of social agents, characterized by varying levels of aggressiveness ($\beta$). Next, the ego agent is trained using samples of these behaviors from a distribution $p_{\text{training}}$. Its performance is then evaluated using a cross-entropy IS proposal $p_{\text{evaluation}}$, which emphasizes unresolved failure modes. Finally, we create a mixture model from both $p_{\text{training}}$ and $p_{\text{evaluation}}$ for use in the next iteration of training.
  • Figure 2: Naturalistic and CEIS distributions for the InD T-intersection experiments inDdataset.
  • Figure 3: Naturalistic and CEIS distributions for the RoundD roundabout experiments rounDdataset.
  • Figure 4: The actual and projected performance for CEIS over longer training iterations.
  • Figure 5: Success and failure examples of the ego vehicle policy in the T-intersection and roundabout scenarios.
  • ...and 1 more figures