Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions
Juan Del Aguila Ferrandis, João Moura, Sethu Vijayakumar
TL;DR
This work addresses robust non-prehensile manipulation under visual occlusions by formulating it as a decision-making problem with occluded observations. It proposes a two-stage approach: (i) a visuotactile state estimator within a Bayesian deep learning framework that predicts object pose and total uncertainty (combining aleatoric and epistemic components) using MC dropout, and (ii) an uncertainty-aware RL controller that consumes the estimator’s mean and uncertainty in the loop. The main contributions are the data-efficient use of privileged simulation policies to train the estimator, the explicit modeling of uncertainty to improve control, and successful sim-to-real transfer using only onboard vision in occlusion-rich environments. Practically, the method enables robust, occlusion-tolerant non-prehensile manipulation on real robots without relying on external perception setups, demonstrated via planar pushing tasks and hardware experiments.
Abstract
Manipulation without grasping, known as non-prehensile manipulation, is essential for dexterous robots in contact-rich environments, but presents many challenges relating with underactuation, hybrid-dynamics, and frictional uncertainty. Additionally, object occlusions in a scenario of contact uncertainty and where the motion of the object evolves independently from the robot becomes a critical problem, which previous literature fails to address. We present a method for learning visuotactile state estimators and uncertainty-aware control policies for non-prehensile manipulation under occlusions, by leveraging diverse interaction data from privileged policies trained in simulation. We formulate the estimator within a Bayesian deep learning framework, to model its uncertainty, and then train uncertainty-aware control policies by incorporating the pre-learned estimator into the reinforcement learning (RL) loop, both of which lead to significantly improved estimator and policy performance. Therefore, unlike prior non-prehensile research that relies on complex external perception set-ups, our method successfully handles occlusions after sim-to-real transfer to robotic hardware with a simple onboard camera. See our video: https://youtu.be/hW-C8i_HWgs.
