BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation
Hirotaka Tahara, Takamitsu Matsubara
TL;DR
BEAC tackles imitation learning for invisible-object, nonprehensile manipulation under partial observability by introducing a Belief Exploration-Action Cloning framework that switches between a pre-designed exploration policy and a task-oriented action policy, guided by belief states inferred from past history. It trains a belief-state estimator with future and past regularization by maximizing mutual information, $I(\mathbf{b}_t; \mathbf{s}_{t+k}|\mathbf{a}_{t:t+k-1})$ and $I(\mathbf{b}_t; \mathbf{o}_{t-k}|\mathbf{a}_{t-k:t-1})$, via variational decoders $G_{\eta^L}$ and $G_{\nu^L}$, yielding robust latent representations for mode and action prediction. In both simulation and real-buried-rock experiments, BEAC achieves higher mode and action prediction accuracy and task success than baselines while reducing demonstrator cognitive load, demonstrating effective learning of complex, partially observable manipulation tasks. This work advances practical imitation learning for tactile, environment-exploration tasks with hidden states, enabling safer and more reliable autonomous manipulation in partially observed settings.
Abstract
Applying imitation learning (IL) is challenging to nonprehensile manipulation tasks of invisible objects with partial observations, such as excavating buried rocks. The demonstrator must make such complex action decisions as exploring to find the object and task-oriented actions to complete the task while estimating its hidden state, perhaps causing inconsistent action demonstration and high cognitive load problems. For these problems, work in human cognitive science suggests that promoting the use of pre-designed, simple exploration rules for the demonstrator may alleviate the problems of action inconsistency and high cognitive load. Therefore, when performing imitation learning from demonstrations using such exploration rules, it is important to accurately imitate not only the demonstrator's task-oriented behavior but also his/her mode-switching behavior (exploratory or task-oriented behavior) under partial observation. Based on the above considerations, this paper proposes a novel imitation learning framework called Belief Exploration-Action Cloning (BEAC), which has a switching policy structure between a pre-designed exploration policy and a task-oriented action policy trained on the estimated belief states based on past history. In simulation and real robot experiments, we confirmed that our proposed method achieved the best task performance, higher mode and action prediction accuracies, while reducing the cognitive load in the demonstration indicated by a user study.
