Table of Contents
Fetching ...

Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches

Peihong Yu, Amisha Bhaskar, Anukriti Singh, Zahiruddin Mahammad, Pratap Tokekar

TL;DR

This work tackles data-efficient robotic manipulation by bootstrapping reinforcement learning with human-drawn trajectory sketches. It introduces Sketch-To-Skill, a three-stage pipeline that converts 2D dual-view sketches into 3D trajectories via a Sketch-to-3D Trajectory Generator, collects open-loop demonstrations, and trains a policy with behavior cloning followed by TD3-based RL augmented with discriminator-guided exploration. The approach achieves performance close to teleoperation-based baselines and outperforms pure RL, while validating transfer to real hardware and demonstrating robustness to sketch imperfections. By enabling non-expert users to guide learning through sketches, the method broadens the accessibility and potential applications of robotic learning in dynamic environments.

Abstract

Training robotic manipulation policies traditionally requires numerous demonstrations and/or environmental rollouts. While recent Imitation Learning (IL) and Reinforcement Learning (RL) methods have reduced the number of required demonstrations, they still rely on expert knowledge to collect high-quality data, limiting scalability and accessibility. We propose Sketch-to-Skill, a novel framework that leverages human-drawn 2D sketch trajectories to bootstrap and guide RL for robotic manipulation. Our approach extends beyond previous sketch-based methods, which were primarily focused on imitation learning or policy conditioning, limited to specific trained tasks. Sketch-to-Skill employs a Sketch-to-3D Trajectory Generator that translates 2D sketches into 3D trajectories, which are then used to autonomously collect initial demonstrations. We utilize these sketch-generated demonstrations in two ways: to pre-train an initial policy through behavior cloning and to refine this policy through RL with guided exploration. Experimental results demonstrate that Sketch-to-Skill achieves ~96% of the performance of the baseline model that leverages teleoperated demonstration data, while exceeding the performance of a pure reinforcement learning policy by ~170%, only from sketch inputs. This makes robotic manipulation learning more accessible and potentially broadens its applications across various domains.

Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches

TL;DR

This work tackles data-efficient robotic manipulation by bootstrapping reinforcement learning with human-drawn trajectory sketches. It introduces Sketch-To-Skill, a three-stage pipeline that converts 2D dual-view sketches into 3D trajectories via a Sketch-to-3D Trajectory Generator, collects open-loop demonstrations, and trains a policy with behavior cloning followed by TD3-based RL augmented with discriminator-guided exploration. The approach achieves performance close to teleoperation-based baselines and outperforms pure RL, while validating transfer to real hardware and demonstrating robustness to sketch imperfections. By enabling non-expert users to guide learning through sketches, the method broadens the accessibility and potential applications of robotic learning in dynamic environments.

Abstract

Training robotic manipulation policies traditionally requires numerous demonstrations and/or environmental rollouts. While recent Imitation Learning (IL) and Reinforcement Learning (RL) methods have reduced the number of required demonstrations, they still rely on expert knowledge to collect high-quality data, limiting scalability and accessibility. We propose Sketch-to-Skill, a novel framework that leverages human-drawn 2D sketch trajectories to bootstrap and guide RL for robotic manipulation. Our approach extends beyond previous sketch-based methods, which were primarily focused on imitation learning or policy conditioning, limited to specific trained tasks. Sketch-to-Skill employs a Sketch-to-3D Trajectory Generator that translates 2D sketches into 3D trajectories, which are then used to autonomously collect initial demonstrations. We utilize these sketch-generated demonstrations in two ways: to pre-train an initial policy through behavior cloning and to refine this policy through RL with guided exploration. Experimental results demonstrate that Sketch-to-Skill achieves ~96% of the performance of the baseline model that leverages teleoperated demonstration data, while exceeding the performance of a pure reinforcement learning policy by ~170%, only from sketch inputs. This makes robotic manipulation learning more accessible and potentially broadens its applications across various domains.

Paper Structure

This paper contains 27 sections, 5 equations, 15 figures, 3 tables, 1 algorithm.

Figures (15)

  • Figure 1: Learning a new skill in the Sketch-To-Skill framework. Step 1: Capture the task scenario from two views and collect human-drawn sketches. Step 2: Convert 2D sketches to 3D trajectories using a pretrained generator. Step 3: Execute generated trajectories to collect experience data. Step 4: Learn manipulation policy using reinforcement learning bootstrapping from behavior cloning and using guidance for experience data.
  • Figure 2: The Sketch-to-3D Trajectory Generator takes dual-view 2D sketches as inputs and predicts B-spline parameters to generate the final 3D trajectory output.
  • Figure 3: Overview of Sketch-To-Skill integrating sketch-generated demonstrations with reinforcement learning. Sketch-generated experiences train an IL policy, which bootstraps the RL process. A discriminator guides exploration by rewarding similarity to sketch-generated trajectories. The final action, combining IL and RL policy outputs, further enhances the exploration guidance. The asterisk after "Replay Buffer" indicates that the buffer is initialized with the open-loop servoing demonstrations.
  • Figure 4: Multi-stage trajectory generation and execution. On the left, we show hand-drawn sketches on scenario RGB images and the extracted sketches on a blank background, (a) generated trajectory from the Sketch-to-3D Trajectory Generator, and (b) executed trajectory via open-loop serving. In (c), we visualize a teleoperated demo for the same task for reference.
  • Figure 5: Evaluation Scores (success rate) for the robomimic PickPlaceCan environment during evaluation.
  • ...and 10 more figures