Table of Contents
Fetching ...

Interactive Perception for Deformable Object Manipulation

Zehang Weng, Peng Zhou, Hang Yin, Alexander Kravberg, Anastasiia Varava, David Navarro-Alarcon, Danica Kragic

TL;DR

A method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration and confirming the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

Abstract

Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to significant manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

Interactive Perception for Deformable Object Manipulation

TL;DR

A method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration and confirming the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

Abstract

Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to significant manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.
Paper Structure (17 sections, 6 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 6 equations, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: An example of Interactive Perception. The perceiver (camera) is moved to a new viewpoint while the actor (end-effector) opens the bag for better perception of in-bag object.
  • Figure 2: [Left] Illustration of the proposed framework --- a subspace of camera action is constructed and represented with manifold with boundary, accounting for the coupling with end-effector motion via the structure of interest (SOI) on deformable objects. [Right] Illustration of state action transition of the proposed framework.
  • Figure 3: Illustration of the process of dynamic active vision space (DAVS) generation. [Left] We extract the SOI points at each time step $t$ and convert them to projected 3D SOI (blue) points through ray tracing, on the original camera action manifold. [Right top] The manifold with boundary based on projected SOI and current camera position by Algorithm \ref{['alg_davs']} [Right bottom] Parameterized exploration space.
  • Figure 4: Left: quantitative evaluation on CubeBagClean. The first row represents the episode length ($\downarrow$) for solving 4 tasks across different combinations of random (Rand) and fixed (Fix) camera (Cam) and end-effector (EE) birth modes. The second row shows the total discounted rewards ($\uparrow$). Right: quantitative evaluation on CubeBagObst subtasks. The first two figures represent the episode length ($\downarrow$) and reward ($\uparrow$) for the scenario with fixed camera (Fix Cam) and random end-effector (Rand EE) birth modes. The third and fourth figures are episode length ($\downarrow$) and reward ($\uparrow$) with random camera (Rand Cam) and random end-effector (Rand EE) birth modes.
  • Figure 5: Visualization of the learned hand-eye policies. In each row except for the second row, we show the images of intermediate frames in an example episode. We also draw arrows $\uparrow$ beneath the images to illustrate how the camera moves. In the first and third rows sampled from the CubeBagClean case, we compare the performance between IP methods with and without DAVS. We can see that with DAVS, the camera is able to move right and upwards to find the cube in the bag, while the one without DAVS searches more randomly. In the bottom two rows from the CubeBagObst case, we compare the methods between IP methods with DAVS and a static vision method. In the IP setting, the camera bypasses the obstacle and finds a feasible solution, while the static vision method doesn't. This reveals the necessity of allowing active vision and active end-effector. The second row is the 3D manifold visualization corresponding to the first row (BlueDot: Camera; Green: 3D SOI; Red: Projected SOI; BlueCurve: Boundary of DAVS).
  • ...and 1 more figures