Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization
Jingjing Chen, Hongjie Fang, Hao-Shu Fang, Cewu Lu
TL;DR
This work introduces Select Segments to Imitate (S2I), a plug-and-play framework that maximizes the value of mixed-quality demonstrations for robotic manipulation by operating at the segment level. It segments demonstrations, uses contrastive learning to identify high-quality segments, and applies trajectory optimization plus action relabeling to refine low-quality segments, forming a merged dataset suitable for downstream policies. Across six tasks in simulation and real-world settings, S2I yields consistent improvements over baselines, with the strongest gains when segment-level selection is combined with trajectory optimization. The approach demonstrates that leveraging suboptimal data through principled segmentation and refinement can substantially boost policy learning without additional data collection. Future work includes handling more complex rotations, scaling to larger datasets, and extending to language-guided demonstrations.
Abstract
Data is crucial for robotic manipulation, as it underpins the development of robotic systems for complex tasks. While high-quality, diverse datasets enhance the performance and adaptability of robotic manipulation policies, collecting extensive expert-level data is resource-intensive. Consequently, many current datasets suffer from quality inconsistencies due to operator variability, highlighting the need for methods to utilize mixed-quality data effectively. To mitigate these issues, we propose "Select Segments to Imitate" (S2I), a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. The framework has three components: demonstration segmentation dividing origin data into meaningful segments, segment selection using contrastive learning to find high-quality segments, and trajectory optimization to refine suboptimal segments for better policy learning. We evaluate S2I through comprehensive experiments in simulation and real-world environments across six tasks, demonstrating that with only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations. Project website: https://tonyfang.net/s2i/.
