Feasibility-aware Imitation Learning from Observations through a Hand-mounted Demonstration Interface
Kei Takahashi, Hikaru Sasaki, Takamitsu Matsubara
TL;DR
This work introduces FABCO, a feasibility-aware imitation learning from observation framework that uses a hand-mounted demonstration interface and pre-trained forward and inverse dynamics models to assess and visualize demonstration feasibility. Demonstrations are guided by real-time feasibility feedback to improve their executability, and the learned policy is trained with feasibility-based weighting to enhance data efficiency and robustness. Empirical validation on a pipette insertion task with four participants shows that visual feasibility feedback and feasibility-weighted learning substantially improve task success rates, with NASA-TLX indicating manageable workload increases. The approach offers a practical pathway to reducing covariate shift in ILfO by coupling demonstrator guidance with feasibility-driven policy optimization.
Abstract
Imitation learning through a demonstration interface is expected to learn policies for robot automation from intuitive human demonstrations. However, due to the differences in human and robot movement characteristics, a human expert might unintentionally demonstrate an action that the robot cannot execute. We propose feasibility-aware behavior cloning from observation (FABCO). In the FABCO framework, the feasibility of each demonstration is assessed using the robot's pre-trained forward and inverse dynamics models. This feasibility information is provided as visual feedback to the demonstrators, encouraging them to refine their demonstrations. During policy learning, estimated feasibility serves as a weight for the demonstration data, improving both the data efficiency and the robustness of the learned policy. We experimentally validated FABCO's effectiveness by applying it to a pipette insertion task involving a pipette and a vial. Four participants assessed the impact of the feasibility feedback and the weighted policy learning in FABCO. Additionally, we used the NASA Task Load Index (NASA-TLX) to evaluate the workload induced by demonstrations with visual feedback.
