Table of Contents
Fetching ...

Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization

Jingjing Chen, Hongjie Fang, Hao-Shu Fang, Cewu Lu

TL;DR

This work introduces Select Segments to Imitate (S2I), a plug-and-play framework that maximizes the value of mixed-quality demonstrations for robotic manipulation by operating at the segment level. It segments demonstrations, uses contrastive learning to identify high-quality segments, and applies trajectory optimization plus action relabeling to refine low-quality segments, forming a merged dataset suitable for downstream policies. Across six tasks in simulation and real-world settings, S2I yields consistent improvements over baselines, with the strongest gains when segment-level selection is combined with trajectory optimization. The approach demonstrates that leveraging suboptimal data through principled segmentation and refinement can substantially boost policy learning without additional data collection. Future work includes handling more complex rotations, scaling to larger datasets, and extending to language-guided demonstrations.

Abstract

Data is crucial for robotic manipulation, as it underpins the development of robotic systems for complex tasks. While high-quality, diverse datasets enhance the performance and adaptability of robotic manipulation policies, collecting extensive expert-level data is resource-intensive. Consequently, many current datasets suffer from quality inconsistencies due to operator variability, highlighting the need for methods to utilize mixed-quality data effectively. To mitigate these issues, we propose "Select Segments to Imitate" (S2I), a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. The framework has three components: demonstration segmentation dividing origin data into meaningful segments, segment selection using contrastive learning to find high-quality segments, and trajectory optimization to refine suboptimal segments for better policy learning. We evaluate S2I through comprehensive experiments in simulation and real-world environments across six tasks, demonstrating that with only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations. Project website: https://tonyfang.net/s2i/.

Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization

TL;DR

This work introduces Select Segments to Imitate (S2I), a plug-and-play framework that maximizes the value of mixed-quality demonstrations for robotic manipulation by operating at the segment level. It segments demonstrations, uses contrastive learning to identify high-quality segments, and applies trajectory optimization plus action relabeling to refine low-quality segments, forming a merged dataset suitable for downstream policies. Across six tasks in simulation and real-world settings, S2I yields consistent improvements over baselines, with the strongest gains when segment-level selection is combined with trajectory optimization. The approach demonstrates that leveraging suboptimal data through principled segmentation and refinement can substantially boost policy learning without additional data collection. Future work includes handling more complex rotations, scaling to larger datasets, and extending to language-guided demonstrations.

Abstract

Data is crucial for robotic manipulation, as it underpins the development of robotic systems for complex tasks. While high-quality, diverse datasets enhance the performance and adaptability of robotic manipulation policies, collecting extensive expert-level data is resource-intensive. Consequently, many current datasets suffer from quality inconsistencies due to operator variability, highlighting the need for methods to utilize mixed-quality data effectively. To mitigate these issues, we propose "Select Segments to Imitate" (S2I), a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. The framework has three components: demonstration segmentation dividing origin data into meaningful segments, segment selection using contrastive learning to find high-quality segments, and trajectory optimization to refine suboptimal segments for better policy learning. We evaluate S2I through comprehensive experiments in simulation and real-world environments across six tasks, demonstrating that with only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations. Project website: https://tonyfang.net/s2i/.
Paper Structure (40 sections, 2 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 40 sections, 2 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: Methods to Deal with Mixed-Quality Demonstrations. Compared with previous methods, our S2I framework selects and optimizes demonstrations at the segment level, preserving semantic consistency within segments while maximizing the retention of high-quality segments and improving lower-quality ones, leading to effective utilization of the mixed-quality demonstrations. With only 3 expert demonstrations for reference, the plug-and-play S2I framework can be used as a data preprocessing step to enhance the performance of various downstream robot manipulation policies when handling mixed-quality demonstrations.
  • Figure 2: Overview of the Select Segments to Imitate (S2I) Framework. (a) Demonstration segmentation stage divides demonstrations into semantic-meaningful segments; (b) Segment selection stage applies contrastive learning to train the representation model on the expert demonstration segments, and then use distance-weighted voting to determine the quality of the mixed-quality demonstration segments; (c) Trajectory optimization stage optimizes the robot trajectory within the low-quality segments and perform action relabeling for the efficient utilization of the whole demonstration dataset (for details, please refer to Fig. \ref{['fig:traj-opt']}). Finally, the high-quality segments and the optimized low-quality segments form the final optimized dataset $\tilde{\mathcal{D}}$, which can be used directly for downstream policy learning.
  • Figure 3: Segmentation Results of a RoboMimic-Canrobomimic Demonstration. UVD might produce meaningless and inconsistent segments like $\tau_\text{UVD}^{(3)}$ and $\tau_\text{UVD}^{(4)}$ compared to heuristic keyframe discovery.
  • Figure 4: Trajectory Optimization and Action Relabeling. Orange points stand for the original trajectory, green ones denote optimized trajectory $\tau'$, and grey ones are discarded points. We perform action relabeling on all points in the original trajectory $\tau$ to form the optimized "trajectory" $\tilde{\tau}$. The example here assumes absolute actions, but S2I can support both absolute and relative actions.
  • Figure 5: Tasks. We select 3 tasks from RoboMimic robomimic (Lift, Can and Square) in the simulation environment, and design 3 tasks (Tissue, Cup and Pen) in the real-world robot platform for evaluation. Detailed descriptions of all tasks: (1) Lift: pick the block; (2) Can: move the can into the target area; (3) Square: pick a square nut and place it on the target rod; (4) Tissue: take tissue out and place it in the container; (5) Cup: collect all the cups (at most 2) into the large metal cup; (6) Pen: collect all the pens (at most 3) into the bowl.
  • ...and 4 more figures