Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

Hanbit Oh; Takamitsu Matsubara

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

Hanbit Oh, Takamitsu Matsubara

TL;DR

This work tackles safe, model-free interactive imitation learning for clearance-limited tasks by introducing Demonstrator-perceived Precision-aware IIL (DPIIL). DPIIL estimates environmental precision from the speed-accuracy trade-off in human demonstrations and couples it with an ensemble policy to compute collision risk, triggering expert interventions when needed without requiring explicit environment models. The approach yields a precision estimator and a risk metric that jointly govern interventions, leading to improved training safety and strong robot-autonomous performance in both simulations (aperture-passing, ring-threading) and real UR5e experiments. Overall, DPIIL advances safe, data-efficient learning for assembly-like tasks where narrow clearances and potential collisions have previously limited model-free IIL applicability.

Abstract

Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collision risks and request intervention by ceding control to a human when collisions are imminent. The former requires an accurate model of the environment, a need that significantly limits the scope of IIL applications. In contrast, humans implicitly demonstrate environmental precision by adjusting their behavior to avoid collisions when performing tasks. Inspired by human behavior, this paper presents a novel interactive learning method that uses demonstrator-perceived precision as a criterion for human intervention called Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). DPIIL captures precision by observing the speed-accuracy trade-off exhibited in human demonstrations and cedes control to a human to avoid collisions in states where high precision is estimated. DPIIL improves the safety of interactive policy learning and ensures efficiency without explicitly providing precise information of the environment. We assessed DPIIL's effectiveness through simulations and real-robot experiments that trained a UR5e 6-DOF robotic arm to perform assembly tasks. Our results significantly improved training safety, and our best performance compared favorably with other learning methods.

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

TL;DR

Abstract

Paper Structure (25 sections, 6 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 6 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Related Work
Interactive Imitation Learning
Risk-aware Interactive Imitation Learning
Speed-accuracy Trade-off in Clearance-limited Tasks
Problem Statement
Demonstrator-perceived Precision-aware IIL
Demonstrator-perceived Precision Estimation
Collision Risk Estimation
Interaction Design
DPIIL Overview
Simulation
Aperture-passing Simulation
Task Setting
Learning Setting
...and 10 more sections

Figures (5)

Figure 1: Overview of Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). In clearance-limited tasks, demonstrator-perceived precision is in the mind of humans. By capturing this precision level from demonstration data and incorporating it into IIL, a robot can cede control to a human (expert mode, bottom) in high-precision areas while executing its policy (auto mode, top) in low-precision areas, thus enhancing safety.
Figure 2: Overview of DPIIL: (top): While a robot is executing a task with its policy, if $\mathbf{s}_t$ is too risky, a human controls it until the risk is sufficiently lowered. (bottom): Policy and precision estimator are iteratively learned from training data collected through interactions. Collision risk is computed with analyzed uncertainty of learned policy and estimated precision.
Figure 3: Aperture-passing simulation: (a): Uncertainty and precision results across state space are obtained using a policy and a precision estimator learned from initial demonstration dataset. Both measurements are normalized to clarify variations across states. Based on these indicators, interactive trajectories of IIL algorithms (DAgger, EnsembleDAgger, DPIIL (Ours)) are compared. (b): Comparison of the 2D vector fields of the policies learned by BC and DPIIL (ours) and their execution trajectories. (c): Averaged performance of interactive (top) and robot-autonomous (bottom) are evaluated by repeating each experiment ten times with random seeds. (top): Interactive performance is measured as a box plot of average success probability during training phases across entire trials of each IIL approach. Significant differences by t-test are observed between proposed method and a baseline ($*:p < 5e{-2}, ***:p < 5e{-4}$). (bottom): Comparing robot-autonomous performance for number of expert actions used to train by conducting 100 test episodes of each learned policy. The t-test results showed no significant difference between our method and other risk-aware IIL methods (EnsembleDAgger and ThriftyDAgger), but a significant difference ($p < 5e{-2}$) with HG-DAgger. (d): Interactive and robot-autonomous performances are measured as $\chi$ values fixed at $\chi \in [10^{-5}, 10^{-3}]$ for each experiment; square of correlation coefficient $r^2$hamby1994param_sensitivity between hyperparameter $\chi$ and each performance is measured as sensitivity indicator.
Figure 4: Ring-threading simulation: (a): Algorithmic expert's demonstration includes two high-precision phases as a robot reaches to grasp a ring and inserts it into a peg. Precision and uncertainty results were obtained by analyzing an initial demo using a precision estimator and a policy learned on the initial demo dataset. Based on this expert, interactive trajectories of IIL algorithms (DAgger, EnsembleDAgger, DPIIL (Ours)) were compared. (b): Averaged performances of interactive (top) and robot-autonomous (bottom) were evaluated by repeating each experiment ten times with random seeds. Other details are identical as previous analysis (Fig. \ref{['fig:wall:env']}).
Figure 5: Real-robot experiments: Experiments were conducted for 6-DOF robotic arm (UR5e) assembly tasks with human experts: (a) reaching a shaft by avoiding obstacles and (b) threading a ring into a peg. Precision and uncertainty results were obtained by analyzing initial demonstration with a precision estimator and policy learned from initial dataset. Both measurements were normalized to visualize variations across states. An interactive demonstration of EnsembleDAgger and DPIIL (Ours) shows trajectories at interactive phase. (c): Illustration of user interface using buttons on a joystick (X-box). In expert mode, pressing the "A" button synchronizes the position of the robot's end effector with that of the human-held ring. While the "B" button is pressed, the robot follows the movement of the ring. If the "B" button is released, the robot stops moving, and synchronization must be redone by pressing the "A" button again. In auto mode, while the "Y" button is pressed, the robot is moved by learned policy. Note, "Y" button is only set to ensure safety in verification evaluations, not as the requirement of our method (DPIIL).

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

TL;DR

Abstract

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)