Table of Contents
Fetching ...

MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts

Xiatao Sun, Shuo Yang, Mingyan Zhou, Kunpeng Liu, Rahul Mangharam

TL;DR

This work addresses imitation learning when multiple imperfect experts are available, a scenario common in safety-critical autonomous systems. It introduces MEGA-DAgger, a three-component extension of HG-DAgger that (i) uses a Control Barrier Function-based data filter to prune unsafe demonstrations, (ii) selects a dominant expert per iteration, and (iii) resolves label conflicts across experts through cosine-similarity-based matching and a combined score $\omega_t$ that balances safety and progress. Empirical results in autonomous racing on the F1TENTH platform show that MEGA-DAgger achieves a better-than-expert policy, outperforming both individual experts and HG-DAgger in overtakes and collision avoidance, and remains effective in real-world hardware experiments. The approach offers a practical, data-efficient path to leveraging diverse experts in multi-agent, safety-critical domains and can be adapted to general autonomous systems with task-specific safety and progress metrics.

Abstract

Imitation learning has been widely applied to various autonomous systems thanks to recent development in interactive algorithms that address covariate shift and compounding errors induced by traditional approaches like behavior cloning. However, existing interactive imitation learning methods assume access to one perfect expert. Whereas in reality, it is more likely to have multiple imperfect experts instead. In this paper, we propose MEGA-DAgger, a new DAgger variant that is suitable for interactive learning with multiple imperfect experts. First, unsafe demonstrations are filtered while aggregating the training data, so the imperfect demonstrations have little influence when training the novice policy. Next, experts are evaluated and compared on scenarios-specific metrics to resolve the conflicted labels among experts. Through experiments in autonomous racing scenarios, we demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms such as Human-Gated DAgger. The supplementary video can be found at \url{https://youtu.be/wPCht31MHrw}.

MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts

TL;DR

This work addresses imitation learning when multiple imperfect experts are available, a scenario common in safety-critical autonomous systems. It introduces MEGA-DAgger, a three-component extension of HG-DAgger that (i) uses a Control Barrier Function-based data filter to prune unsafe demonstrations, (ii) selects a dominant expert per iteration, and (iii) resolves label conflicts across experts through cosine-similarity-based matching and a combined score that balances safety and progress. Empirical results in autonomous racing on the F1TENTH platform show that MEGA-DAgger achieves a better-than-expert policy, outperforming both individual experts and HG-DAgger in overtakes and collision avoidance, and remains effective in real-world hardware experiments. The approach offers a practical, data-efficient path to leveraging diverse experts in multi-agent, safety-critical domains and can be adapted to general autonomous systems with task-specific safety and progress metrics.

Abstract

Imitation learning has been widely applied to various autonomous systems thanks to recent development in interactive algorithms that address covariate shift and compounding errors induced by traditional approaches like behavior cloning. However, existing interactive imitation learning methods assume access to one perfect expert. Whereas in reality, it is more likely to have multiple imperfect experts instead. In this paper, we propose MEGA-DAgger, a new DAgger variant that is suitable for interactive learning with multiple imperfect experts. First, unsafe demonstrations are filtered while aggregating the training data, so the imperfect demonstrations have little influence when training the novice policy. Next, experts are evaluated and compared on scenarios-specific metrics to resolve the conflicted labels among experts. Through experiments in autonomous racing scenarios, we demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms such as Human-Gated DAgger. The supplementary video can be found at \url{https://youtu.be/wPCht31MHrw}.
Paper Structure (13 sections, 6 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 6 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of learning from multiple imperfect experts. For example, two rollouts $a$ and $b$ (yellow and blue trajectories in the left figure) are from two different experts, respectively. Each of them has undesired unsafe behavior (red box), and ideally we can learn complementary good behavior from both of them (green trajectory $c$ in the right figure).
  • Figure 2: Control loop for MEGA-DAgger. For each iteration, one expert should be chosen to be the dominant expert. Data Filter is used to remove unsafe demonstration and Conflict Resolution is used to eliminate actions conflict among experts.
  • Figure 3: Illustration of conflicted labels from different experts. Blue dots represent hit points of LiDAR scans. Red and green arrows represent unit vectors of steering angles from labels. Pink rectangles represent ego and opponent vehicles. Grey lines represent the boundaries of the race tracks.
  • Figure 4: The effect of the data filter on overtakes rate (above) and collisions rate (below), respectively. The results with different undesired behavior probability $P(U)$ are presented.
  • Figure 5: Comparison of MEGA-DAgger, HG-DAgger with data filter, HG-DAgger, DAgger, and Behavior Cloning on two different maps. The left and right columns show the results on Map 1 and Map 2, respectively. Policy networks are saved every 100 training rollouts for evaluation. Each plot is an average of 5 experiments, and the shaded region represents 95% confidence intervals.
  • ...and 6 more figures