Table of Contents
Fetching ...

Learning to Manipulate Deformable Objects without Demonstrations

Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel

TL;DR

<p>Deformable object manipulation poses challenges due to lack of canonical state and complex, non-linear dynamics. The authors propose model-free visual reinforcement learning with a structured, iterative pick-and-place action space and a two-stage training regime that decouples placing from picking, leveraging Maximum Value under Placing (MVP) to guide the picking policy. They demonstrate order-of-magnitude faster learning in simulation across cloth and rope tasks and achieve transfer to a real PR2 robot through domain randomization, outperforming standard RL baselines on average coverage. This approach offers a scalable path for deformable-object manipulation from vision without human demonstrations and can be extended to broader manipulation settings and demonstrations-guided hybrids.

Abstract

In this paper we tackle the problem of deformable object manipulation through model-free visual reinforcement learning (RL). In order to circumvent the sample inefficiency of RL, we propose two key ideas that accelerate learning. First, we propose an iterative pick-place action space that encodes the conditional relationship between picking and placing on deformable objects. The explicit structural encoding enables faster learning under complex object dynamics. Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points. Then, by selecting the pick point that has Maximal Value under Placing (MVP), we obtain our picking policy. This provides us with an informed picking policy during testing, while using only random pick points during training. Experimentally, this learning framework obtains an order of magnitude faster learning compared to independent action-spaces on our suite of deformable object manipulation tasks with visual RGB observations. Finally, using domain randomization, we transfer our policies to a real PR2 robot for challenging cloth and rope coverage tasks, and demonstrate significant improvements over standard RL techniques on average coverage.

Learning to Manipulate Deformable Objects without Demonstrations

TL;DR

<p>Deformable object manipulation poses challenges due to lack of canonical state and complex, non-linear dynamics. The authors propose model-free visual reinforcement learning with a structured, iterative pick-and-place action space and a two-stage training regime that decouples placing from picking, leveraging Maximum Value under Placing (MVP) to guide the picking policy. They demonstrate order-of-magnitude faster learning in simulation across cloth and rope tasks and achieve transfer to a real PR2 robot through domain randomization, outperforming standard RL baselines on average coverage. This approach offers a scalable path for deformable-object manipulation from vision without human demonstrations and can be extended to broader manipulation settings and demonstrations-guided hybrids.

Abstract

In this paper we tackle the problem of deformable object manipulation through model-free visual reinforcement learning (RL). In order to circumvent the sample inefficiency of RL, we propose two key ideas that accelerate learning. First, we propose an iterative pick-place action space that encodes the conditional relationship between picking and placing on deformable objects. The explicit structural encoding enables faster learning under complex object dynamics. Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points. Then, by selecting the pick point that has Maximal Value under Placing (MVP), we obtain our picking policy. This provides us with an informed picking policy during testing, while using only random pick points during training. Experimentally, this learning framework obtains an order of magnitude faster learning compared to independent action-spaces on our suite of deformable object manipulation tasks with visual RGB observations. Finally, using domain randomization, we transfer our policies to a real PR2 robot for challenging cloth and rope coverage tasks, and demonstrate significant improvements over standard RL techniques on average coverage.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: We look at the problem of deformable object manipulation, where the robot needs to manipulate a deformable object, say the blue cloth, into a desired goal location (green in (a)). Our method learns an explicit placing policy (arrows in (b) and (c)), along with an implicit picking policy. This method is evaluated on cloth (b) and rope (c) tasks using our PR2 robot. The heatmaps represent the distribution of the Q-value, where the Q-values over each pick location are normalized to the range of 0 (blue) to 1 (red).
  • Figure 2: In direct policy learning (a), the policy directly outputs both the pick and the place location. While in conditional policy learning, the composite action space is broken down into a separate picking and placing policy, where the placing policy takes the output of the picking policy as input.
  • Figure 3: Learning comparisons between baselines and our method on the three deformable object manipulation environments with state-based training in simulation. The dotted black line is computed by evaluating MVP on the final learned 'learned placing with uniform pick' policy. Each experiment was run on 4 random seeds.
  • Figure 4: Learning comparisons between baselines and our method on two deformable object manipulation environments with image-based training in simulation. Note that we do not include the cloth-simplified environment here since image-based transfer to real robot would involve corner detection. The dotted black line is computed by evaluating MVP on the final learned 'learned placing with uniform pick' policy. Each experiment was run on 3 random seeds.
  • Figure 5: We demonstrate deformable object manipulation in the simulated environments using our learned MVP policy. In the top half, we see the policy successfully horizontally straightens and centers a rope in the top. And in the bottom half, we see our method successfully spreading out a cloth from multiple starting states. Each image is about 5 actions apart for rope experiments, and 10 actions for cloth experiments.
  • ...and 3 more figures