Towards Balanced Behavior Cloning from Imbalanced Datasets
Sagar Parekh, Heramb Nemlekar, Dylan P. Losey
TL;DR
The paper addresses how imbalanced human demonstrations bias imitation learning policies toward frequently seen subtasks. It formalizes demonstrations as a mix of sub-policies and analyzes why equal weighting in behavior cloning favors dominant behaviors, proposing several data-balancing strategies including a novel meta-gradient rebalancing method. The authors show theoretically that dataset proportions bias learning and empirically demonstrate improvements in downstream imitation tasks when balancing offline data, with careful consideration of the limitations of each approach. They introduce a principled procedure to learn target losses per sub-policy, enabling balanced learning without extra data collection and highlighting practical implications for multi-task robotic learning. Overall, the work provides a framework and toolbox for balancing heterogeneous imitation datasets, improving generalization across behaviors while outlining avenues for future task-aware offline balancing.
Abstract
Robots should be able to learn complex behaviors from human demonstrations. In practice, these human-provided datasets are inevitably imbalanced: i.e., the human demonstrates some subtasks more frequently than others. State-of-the-art methods default to treating each element of the human's dataset as equally important. So if -- for instance -- the majority of the human's data focuses on reaching a goal, and only a few state-action pairs move to avoid an obstacle, the learning algorithm will place greater emphasis on goal reaching. More generally, misalignment between the relative amounts of data and the importance of that data causes fundamental problems for imitation learning approaches. In this paper we analyze and develop learning methods that automatically account for mixed datasets. We formally prove that imbalanced data leads to imbalanced policies when each state-action pair is weighted equally; these policies emulate the most represented behaviors, and not the human's complex, multi-task demonstrations. We next explore algorithms that rebalance offline datasets (i.e., reweight the importance of different state-action pairs) without human oversight. Reweighting the dataset can enhance the overall policy performance. However, there is no free lunch: each method for autonomously rebalancing brings its own pros and cons. We formulate these advantages and disadvantages, helping other researchers identify when each type of approach is most appropriate. We conclude by introducing a novel meta-gradient rebalancing algorithm that addresses the primary limitations behind existing approaches. Our experiments show that dataset rebalancing leads to better downstream learning, improving the performance of general imitation learning algorithms without requiring additional data collection. See our project website: https://collab.me.vt.edu/data_curation/.
