Table of Contents
Fetching ...

Data-Efficient Multitask DAgger

Haotian Fu, Ran Gong, Xiaohan Zhang, Maria Vittoria Minniti, Jigarkumar Patel, Karl Schmeckpeper

TL;DR

Data-efficient multitask DAgger addresses the data hunger of multitask robotics by distilling a single vision-based policy from multiple state-based experts. It introduces a performance-aware scheduler that allocates demonstrations across tasks using Kalman-filtered success probabilities and learning-progress signals (Task Need and Performance Gain). The method achieves higher final task success with far fewer expert demonstrations on MetaWorld and IsaacLab, and demonstrates zero-shot sim-to-real transfer to real robots. This approach offers a scalable path toward generalist policies that can adapt across diverse manipulation tasks with limited data.

Abstract

Generalist robot policies that can perform many tasks typically require extensive expert data or simulations for training. In this work, we propose a novel Data-Efficient multitask DAgger framework that distills a single multitask policy from multiple task-specific expert policies. Our approach significantly increases the overall task success rate by actively focusing on tasks where the multitask policy underperforms. The core of our method is a performance-aware scheduling strategy that tracks how much each task's learning process benefits from the amount of data, using a Kalman filter-based estimator to robustly decide how to allocate additional demonstrations across tasks. We validate our approach on MetaWorld, as well as a suite of diverse drawer-opening tasks in IsaacLab. The resulting policy attains high performance across all tasks while using substantially fewer expert demonstrations, and the visual policy learned with our method in simulation shows better performance than naive DAgger and Behavior Cloning when transferring zero-shot to a real robot without using real data.

Data-Efficient Multitask DAgger

TL;DR

Data-efficient multitask DAgger addresses the data hunger of multitask robotics by distilling a single vision-based policy from multiple state-based experts. It introduces a performance-aware scheduler that allocates demonstrations across tasks using Kalman-filtered success probabilities and learning-progress signals (Task Need and Performance Gain). The method achieves higher final task success with far fewer expert demonstrations on MetaWorld and IsaacLab, and demonstrates zero-shot sim-to-real transfer to real robots. This approach offers a scalable path toward generalist policies that can adapt across diverse manipulation tasks with limited data.

Abstract

Generalist robot policies that can perform many tasks typically require extensive expert data or simulations for training. In this work, we propose a novel Data-Efficient multitask DAgger framework that distills a single multitask policy from multiple task-specific expert policies. Our approach significantly increases the overall task success rate by actively focusing on tasks where the multitask policy underperforms. The core of our method is a performance-aware scheduling strategy that tracks how much each task's learning process benefits from the amount of data, using a Kalman filter-based estimator to robustly decide how to allocate additional demonstrations across tasks. We validate our approach on MetaWorld, as well as a suite of diverse drawer-opening tasks in IsaacLab. The resulting policy attains high performance across all tasks while using substantially fewer expert demonstrations, and the visual policy learned with our method in simulation shows better performance than naive DAgger and Behavior Cloning when transferring zero-shot to a real robot without using real data.

Paper Structure

This paper contains 27 sections, 7 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Overview of our data-efficient multitask DAgger framework. Multiple task-specific expert policies (blue) provide demonstrations to train a single multitask policy. In each DAgger iteration, the multitask policy is deployed on all tasks and a performance-aware scheduler (orange) uses the Task Need (success rate) and Performance Gain (loss improvement) metrics to prioritize which tasks should receive new expert demonstrations. This focused data collection yields a high-performing generalist policy with far fewer demonstrations. The multitask policy trained in simulation can be directly transferred to real-world tasks.
  • Figure 2: Overview of the proposed data-efficient multitask DAgger algorithm.
  • Figure 3: Examples of Meta-World tasks and IsaacLab Drawer tasks used in our experiments.
  • Figure 4: Average success rate vs. the number of expert demonstrations collected per task (averaged). Left: Metaworld state-based multitask policy comparison. Middle: Metaworld pixel-based multitask policy comparison. Right: IsaacLab Drawer point-cloud-based multitask policy comparison. Our performance-aware scheduling mechanism generally achieves higher sample efficiency and final performance compared to standard BC with varying data budgets and Uniform DAgger.
  • Figure 5: Comparison of number of trajectories needed to reach high success rates in MetaWorld and IsaacLab Drawer tasks.
  • ...and 5 more figures