Feasibility-aware Imitation Learning from Observation with Multimodal Feedback

Kei Takahashi; Hikaru Sasaki; Takamitsu Matsubara

Feasibility-aware Imitation Learning from Observation with Multimodal Feedback

Kei Takahashi, Hikaru Sasaki, Takamitsu Matsubara

TL;DR

The paper presents FABCO, a feasibility-aware imitation learning framework that leverages hand-mounted demonstrations by estimating robot motion feasibility with forward and inverse dynamics models. It provides multimodal feedback (visual and haptic) to guide demonstrators toward robot-feasible trajectories and uses feasibility-aware weighting and action chunking with temporal ensembling to train robust policies. Across peg-insertion and circle-tracing tasks with 15 participants, FABCO achieves over 3.2x improvements in imitation performance compared to No FB baselines, with visuo-haptic feedback delivering the best overall policy performance. The work highlights the importance of combining demonstration-time and policy-time feasibility signals, analyzes workload and preferences across feedback modalities, and points to future enhancements in dynamics learning efficiency and broader applicability to gripper motions. The practical impact lies in enabling novices to teach robots feasible behaviors efficiently through intuitive interfaces and informative feedback, improving stability and safety in real-world manipulation tasks.

Abstract

Imitation learning frameworks that learn robot control policies from demonstrators' motions via hand-mounted demonstration interfaces have attracted increasing attention. However, due to differences in physical characteristics between demonstrators and robots, this approach faces two limitations: i) the demonstration data do not include robot actions, and ii) the demonstrated motions may be infeasible for robots. These limitations make policy learning difficult. To address them, we propose Feasibility-Aware Behavior Cloning from Observation (FABCO). FABCO integrates behavior cloning from observation, which complements robot actions using robot dynamics models, with feasibility estimation. In feasibility estimation, the demonstrated motions are evaluated using a robot-dynamics model, learned from the robot's execution data, to assess reproducibility under the robot's dynamics. The estimated feasibility is used for multimodal feedback and feasibility-aware policy learning to improve the demonstrator's motions and learn robust policies. Multimodal feedback provides feasibility through the demonstrator's visual and haptic senses to promote feasible demonstrated motions. Feasibility-aware policy learning reduces the influence of demonstrated motions that are infeasible for robots, enabling the learning of policies that robots can execute stably. We conducted experiments with 15 participants on two tasks and confirmed that FABCO improves imitation learning performance by more than 3.2 times compared to the case without feasibility feedback.

Feasibility-aware Imitation Learning from Observation with Multimodal Feedback

TL;DR

Abstract

Paper Structure (37 sections, 10 equations, 18 figures, 3 tables, 1 algorithm)

This paper contains 37 sections, 10 equations, 18 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Demonstration methods in imitation learning
Applying feasibility to imitation learning
Preliminaries
Behavior Cloning from Observation
Problem setting
IDM learning and action estimation
Policy learning
Action Chunking and Temporal Ensembling
Feasibility-Aware Behavior Cloning from Observation
Problem Statement
Demonstration interface
Learning of forward and inverse dynamics models
Feasibility estimation
...and 22 more sections

Figures (18)

Figure 1: Comparison between conventional and proposed feasibility-aware imitation learning frameworks. (a) In demonstrations performed with a hand-mounted interface, feasibility constraints are not reflected in demonstrations, resulting in learned policies that may produce unpredictable or undesirable motions. (b) Proposed framework incorporates feasibility in both demonstration and learning, promoting feasible demonstrations and improving performance of learned policy toward desired motions.
Figure 2: Overview of FABCO framework. This framework incorporates feasibility into both demonstration and policy learning in imitation learning using a hand-mounted demonstration interface. (a) The demonstrator receives feasibility feedback on demonstrated motions, which is computed using FDM and IDM, and refines the demonstration based on this feedback. (b) Policy learning uses the feasibility of the demonstration data, permitting the learned policy to generate feasible robot motions.
Figure 3: Hand-mounted demonstration interface. (a) Overview of interface. An encoder is attached to measure finger position. Motion-capture markers are attached to track the interface pose. (b) Internal structure of handle. It is equipped with two vibration motors.
Figure 4: Experimental setup for human-subject experiment. (a) Experimental environment: In this environment, the workspace on the desk is shared with a demonstrator and the robot. The motion is captured by a motion-capture camera. The demonstrator receives the feasibility through the monitor and the interface. (b) Visual feasibility feedback system: This system provides two types of visualizations: pose-level feasibility visualization and overall feasibility visualization.
Figure 5: Task environments. (a) The peg-insertion task requires moving a grasped peg from start to goal while avoiding obstacles. (b) The circle-tracing task requires tracing a circular path on a table. In each task, the green region denotes the workspace in which the robot must complete motions.
...and 13 more figures

Feasibility-aware Imitation Learning from Observation with Multimodal Feedback

TL;DR

Abstract

Feasibility-aware Imitation Learning from Observation with Multimodal Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (18)