Table of Contents
Fetching ...

Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

Juan Del Aguila Ferrandis, João Moura, Sethu Vijayakumar

TL;DR

This work addresses robust non-prehensile manipulation under visual occlusions by formulating it as a decision-making problem with occluded observations. It proposes a two-stage approach: (i) a visuotactile state estimator within a Bayesian deep learning framework that predicts object pose and total uncertainty (combining aleatoric and epistemic components) using MC dropout, and (ii) an uncertainty-aware RL controller that consumes the estimator’s mean and uncertainty in the loop. The main contributions are the data-efficient use of privileged simulation policies to train the estimator, the explicit modeling of uncertainty to improve control, and successful sim-to-real transfer using only onboard vision in occlusion-rich environments. Practically, the method enables robust, occlusion-tolerant non-prehensile manipulation on real robots without relying on external perception setups, demonstrated via planar pushing tasks and hardware experiments.

Abstract

Manipulation without grasping, known as non-prehensile manipulation, is essential for dexterous robots in contact-rich environments, but presents many challenges relating with underactuation, hybrid-dynamics, and frictional uncertainty. Additionally, object occlusions in a scenario of contact uncertainty and where the motion of the object evolves independently from the robot becomes a critical problem, which previous literature fails to address. We present a method for learning visuotactile state estimators and uncertainty-aware control policies for non-prehensile manipulation under occlusions, by leveraging diverse interaction data from privileged policies trained in simulation. We formulate the estimator within a Bayesian deep learning framework, to model its uncertainty, and then train uncertainty-aware control policies by incorporating the pre-learned estimator into the reinforcement learning (RL) loop, both of which lead to significantly improved estimator and policy performance. Therefore, unlike prior non-prehensile research that relies on complex external perception set-ups, our method successfully handles occlusions after sim-to-real transfer to robotic hardware with a simple onboard camera. See our video: https://youtu.be/hW-C8i_HWgs.

Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

TL;DR

This work addresses robust non-prehensile manipulation under visual occlusions by formulating it as a decision-making problem with occluded observations. It proposes a two-stage approach: (i) a visuotactile state estimator within a Bayesian deep learning framework that predicts object pose and total uncertainty (combining aleatoric and epistemic components) using MC dropout, and (ii) an uncertainty-aware RL controller that consumes the estimator’s mean and uncertainty in the loop. The main contributions are the data-efficient use of privileged simulation policies to train the estimator, the explicit modeling of uncertainty to improve control, and successful sim-to-real transfer using only onboard vision in occlusion-rich environments. Practically, the method enables robust, occlusion-tolerant non-prehensile manipulation on real robots without relying on external perception setups, demonstrated via planar pushing tasks and hardware experiments.

Abstract

Manipulation without grasping, known as non-prehensile manipulation, is essential for dexterous robots in contact-rich environments, but presents many challenges relating with underactuation, hybrid-dynamics, and frictional uncertainty. Additionally, object occlusions in a scenario of contact uncertainty and where the motion of the object evolves independently from the robot becomes a critical problem, which previous literature fails to address. We present a method for learning visuotactile state estimators and uncertainty-aware control policies for non-prehensile manipulation under occlusions, by leveraging diverse interaction data from privileged policies trained in simulation. We formulate the estimator within a Bayesian deep learning framework, to model its uncertainty, and then train uncertainty-aware control policies by incorporating the pre-learned estimator into the reinforcement learning (RL) loop, both of which lead to significantly improved estimator and policy performance. Therefore, unlike prior non-prehensile research that relies on complex external perception set-ups, our method successfully handles occlusions after sim-to-real transfer to robotic hardware with a simple onboard camera. See our video: https://youtu.be/hW-C8i_HWgs.

Paper Structure

This paper contains 19 sections, 4 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Snapshots from an exemplar robot motion pushing a box to the target with occlusions. Columns correspond to different time-frames, with (b) and (d) corresponding to two snapshots where the robot view occludes the reading of the box pose. The first row shows the robot setup, the second row shows the robot camera view for the tag/object detection, and the final row shows the rviz view with yellow and blue corresponding to the observation (from the camera) and the estimation of the box poses. The green area marks the target.
  • Figure 2: Performance with different training configurations of the control policy. For the RL learning curves, we report mean and standard deviation across three training seeds.
  • Figure 3: Performance of different control policies with increasing occlusion durations.
  • Figure 4: Trajectories generated under full occlusion.
  • Figure 5: Planar pushing simulation environment in Isaac Sim. The pusher is shown in red, the manipulated object in dark blue, and the target pose in light blue.
  • ...and 4 more figures