Table of Contents
Fetching ...

Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision

Hanbit Oh, Masaki Murooka, Tomohiro Motoda, Ryoichi Nakajo, Yukiyasu Domae

TL;DR

The paper addresses data efficiency and safety in robot imitation learning for clearance-limited manipulation by proposing Self-Augmented Robot Trajectory (SART), a framework that learns from a single human demonstration augmented by autonomous, collision-free trajectories within user-annotated precision spheres. SART comprises two stages: a one-shot teaching phase with annotated spheres around key waypoints and a self-augmentation phase where the robot samples poses on spherical surfaces and reconnects to the original demonstration to produce diverse training data. Through extensive sim-to-real experiments on peg-in-hole, door opening, lid opening, toolbox picking, bottle placing, and lid closing tasks, SART demonstrates substantially higher success rates than single-demo replay, behavioral cloning, and contact-free MILES, while reducing human data collection effort. The approach offers a safer, more data-efficient pathway for imitation learning in manipulation, with potential for integration with larger visuomotor models and future extension to contact-aware augmentation strategies.

Abstract

Imitation learning is a promising paradigm for training robot agents; however, standard approaches typically require substantial data acquisition -- via numerous demonstrations or random exploration -- to ensure reliable performance. Although exploration reduces human effort, it lacks safety guarantees and often results in frequent collisions -- particularly in clearance-limited tasks (e.g., peg-in-hole) -- thereby, necessitating manual environmental resets and imposing additional human burden. This study proposes Self-Augmented Robot Trajectory (SART), a framework that enables policy learning from a single human demonstration, while safely expanding the dataset through autonomous augmentation. SART consists of two stages: (1) human teaching only once, where a single demonstration is provided and precision boundaries -- represented as spheres around key waypoints -- are annotated, followed by one environment reset; (2) robot self-augmentation, where the robot generates diverse, collision-free trajectories within these boundaries and reconnects to the original demonstration. This design improves the data collection efficiency by minimizing human effort while ensuring safety. Extensive evaluations in simulation and real-world manipulation tasks show that SART achieves substantially higher success rates than policies trained solely on human-collected demonstrations. Video results available at https://sites.google.com/view/sart-il .

Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision

TL;DR

The paper addresses data efficiency and safety in robot imitation learning for clearance-limited manipulation by proposing Self-Augmented Robot Trajectory (SART), a framework that learns from a single human demonstration augmented by autonomous, collision-free trajectories within user-annotated precision spheres. SART comprises two stages: a one-shot teaching phase with annotated spheres around key waypoints and a self-augmentation phase where the robot samples poses on spherical surfaces and reconnects to the original demonstration to produce diverse training data. Through extensive sim-to-real experiments on peg-in-hole, door opening, lid opening, toolbox picking, bottle placing, and lid closing tasks, SART demonstrates substantially higher success rates than single-demo replay, behavioral cloning, and contact-free MILES, while reducing human data collection effort. The approach offers a safer, more data-efficient pathway for imitation learning in manipulation, with potential for integration with larger visuomotor models and future extension to contact-aware augmentation strategies.

Abstract

Imitation learning is a promising paradigm for training robot agents; however, standard approaches typically require substantial data acquisition -- via numerous demonstrations or random exploration -- to ensure reliable performance. Although exploration reduces human effort, it lacks safety guarantees and often results in frequent collisions -- particularly in clearance-limited tasks (e.g., peg-in-hole) -- thereby, necessitating manual environmental resets and imposing additional human burden. This study proposes Self-Augmented Robot Trajectory (SART), a framework that enables policy learning from a single human demonstration, while safely expanding the dataset through autonomous augmentation. SART consists of two stages: (1) human teaching only once, where a single demonstration is provided and precision boundaries -- represented as spheres around key waypoints -- are annotated, followed by one environment reset; (2) robot self-augmentation, where the robot generates diverse, collision-free trajectories within these boundaries and reconnects to the original demonstration. This design improves the data collection efficiency by minimizing human effort while ensuring safety. Extensive evaluations in simulation and real-world manipulation tasks show that SART achieves substantially higher success rates than policies trained solely on human-collected demonstrations. Video results available at https://sites.google.com/view/sart-il .

Paper Structure

This paper contains 22 sections, 7 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustrative example comparing standard imitation learning (left), imitation learning with random self-augmentation (center), and the proposed self-augmented robot trajectory (SART) method (right). Standard imitation learning suffers from covariate shift due to limited variation in demonstrations, often leading to failures in grasping tasks; moreover, failures typically require frequent environment resets. Random self-augmentation---such as random exploration via reinforcement learning---can expand data coverage. However, collisions during exploration create unsafe behaviors and force repeated environment resets. In contrast, SART generates diverse, collision-free trajectories from a single human demonstration within annotated precision boundaries, eliminating the need for resets while reducing covariate shift and enabling successful policy execution.
  • Figure 2: Overview of SART: A human provides a single demonstration and explicitly annotates precision boundaries around key waypoints. The robot then autonomously augments its dataset by exploring within these boundaries, generating diverse, collision-free trajectories used to learn a robust policy.
  • Figure 3: Evaluation tasks. The four panels on the left show the simulated environments, where a robot performs the peg-in-hole, door opening, lid opening, and toolbox picking tasks. The two panels on the right show real-world environments, where a robot performs the bottle placement and lid closure tasks. These tasks represent diverse clearance-limited manipulation scenarios of varying complexity.
  • Figure 4: Comparison of training data ($N=30$) for the peg-in-hole task across varying methods. BC: requires multiple human demonstrations (purple). For clarity, the goal object (hole) is shown in a fixed position, although the demonstrations were collected with varying goal positions. Contact-free MILES: augments trajectories from a single demonstration but produces backward motions with limited coverage (orange). SART (proposed): begins with a single human demonstration (purple) and annotated precision boundaries (red spheres) and subsequently generates diverse, collision-free augmented trajectories (orange) autonomously. For clarity of the end-effector trajectories, the peg is omitted from the visualization.
  • Figure 5: Rollouts of policies trained using $N=30$ training data across various methods. Red: SART (proposed), Blue: Single-demo replay, Yellow: BC, Green: Contact-free MILES. Among these trajectories, only SART completes the peg-in-hole task. For clarity of the end-effector trajectories, the peg is omitted from the visualization.
  • ...and 5 more figures