Feasibility-aware Imitation Learning from Observation with Multimodal Feedback
Kei Takahashi, Hikaru Sasaki, Takamitsu Matsubara
TL;DR
The paper presents FABCO, a feasibility-aware imitation learning framework that leverages hand-mounted demonstrations by estimating robot motion feasibility with forward and inverse dynamics models. It provides multimodal feedback (visual and haptic) to guide demonstrators toward robot-feasible trajectories and uses feasibility-aware weighting and action chunking with temporal ensembling to train robust policies. Across peg-insertion and circle-tracing tasks with 15 participants, FABCO achieves over 3.2x improvements in imitation performance compared to No FB baselines, with visuo-haptic feedback delivering the best overall policy performance. The work highlights the importance of combining demonstration-time and policy-time feasibility signals, analyzes workload and preferences across feedback modalities, and points to future enhancements in dynamics learning efficiency and broader applicability to gripper motions. The practical impact lies in enabling novices to teach robots feasible behaviors efficiently through intuitive interfaces and informative feedback, improving stability and safety in real-world manipulation tasks.
Abstract
Imitation learning frameworks that learn robot control policies from demonstrators' motions via hand-mounted demonstration interfaces have attracted increasing attention. However, due to differences in physical characteristics between demonstrators and robots, this approach faces two limitations: i) the demonstration data do not include robot actions, and ii) the demonstrated motions may be infeasible for robots. These limitations make policy learning difficult. To address them, we propose Feasibility-Aware Behavior Cloning from Observation (FABCO). FABCO integrates behavior cloning from observation, which complements robot actions using robot dynamics models, with feasibility estimation. In feasibility estimation, the demonstrated motions are evaluated using a robot-dynamics model, learned from the robot's execution data, to assess reproducibility under the robot's dynamics. The estimated feasibility is used for multimodal feedback and feasibility-aware policy learning to improve the demonstrator's motions and learn robust policies. Multimodal feedback provides feasibility through the demonstrator's visual and haptic senses to promote feasible demonstrated motions. Feasibility-aware policy learning reduces the influence of demonstrated motions that are infeasible for robots, enabling the learning of policies that robots can execute stably. We conducted experiments with 15 participants on two tasks and confirmed that FABCO improves imitation learning performance by more than 3.2 times compared to the case without feasibility feedback.
