Working Backwards: Learning to Place by Picking

Oliver Limoyo; Abhisek Konar; Trevor Ablett; Jonathan Kelly; Francois R. Hogan; Gregory Dudek

Working Backwards: Learning to Place by Picking

Oliver Limoyo, Abhisek Konar, Trevor Ablett, Jonathan Kelly, Francois R. Hogan, Gregory Dudek

TL;DR

Robotic placement in contact-constrained environments remains data-hungry; this work introduces Placing via Picking (PvP), a self-supervised data-collection pipeline that generates placement demonstrations by time-reversing retrieval trajectories. PvP relies on compliant grasping (CCG) and tactile regrasping (TR) to enable uninterrupted data collection and utilizes a language-driven grasp planner to produce 6-DOF grasps, with expert actions defined as $\Delta\mathbf{T} \in \mathrm{SE}(3)$ and $a_{gripper} \in \{0,1\}$. The collected demonstrations are used to train a vision-based policy via Behavioral Cloning, employing either a unimodal Gaussian or Gaussian mixture action distribution and operating on 128x128 RGB inputs stacked over time. In real-world home tasks like dishrack loading and table setting, PvP-trained policies outperform kinesthetic-teaching baselines in both success rate and data efficiency, demonstrating scalable, autonomous placement with minimal human supervision. The results highlight the potential of leveraging contact constraints, tactile sensing, and time-reversal symmetry to substantially improve robotic placing data collection and learning.

Abstract

We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific, contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact-constrained environments without human intervention using two modules: compliant control for grasping and tactile regrasping. We train a policy directly from visual observations through behavioural cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robot scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of success rate and data efficiency, while requiring no human supervision.

Working Backwards: Learning to Place by Picking

TL;DR

and

. The collected demonstrations are used to train a vision-based policy via Behavioral Cloning, employing either a unimodal Gaussian or Gaussian mixture action distribution and operating on 128x128 RGB inputs stacked over time. In real-world home tasks like dishrack loading and table setting, PvP-trained policies outperform kinesthetic-teaching baselines in both success rate and data efficiency, demonstrating scalable, autonomous placement with minimal human supervision. The results highlight the potential of leveraging contact constraints, tactile sensing, and time-reversal symmetry to substantially improve robotic placing data collection and learning.

Abstract

Paper Structure (19 sections, 8 equations, 7 figures, 1 table)

This paper contains 19 sections, 8 equations, 7 figures, 1 table.

Introduction
Related Work
Automatic Data Collection
Working Backwards
Robotic Manipulator Placing
Placing via Picking
Self-Supervised Data Collection
Grasp Planning
Grasping
Retrieving
Placing by Reversing
Noise-Augmented Data Collection
Policy Learning
Experiments
PvP Data Collection Robustness
...and 4 more sections

Figures (7)

Figure 1: The four steps involved in PvP, our autonomous demonstration data collection process for placing. We (a) generate grasps with an off-the-shelf grasp planner sundermeyer2021contact; (b) compliantly grasp the object to apply minimal forces to the environment while ensuring a stable grasp via tactile sensing; (c) retrieve the object with rotational compliance while storing the trajectory; and (d) generate placement demonstration data by rolling out the reversed grasp trajectories while storing the observations and actions.
Figure 2: The steps involved in our language-driven grasp planning pipeline. We use (b) Grounding-Dino liu2023grounding for object bounding box detection based on text descriptions and (c) Segment Anything kirillov2023segany on the cropped images to produce object specific masks. We then use (d) Contact-GraspNet sundermeyer2021contact for grasp generation on only the segmented objects (i.e., masked areas). Top: using "green plate" and "blue plate" as the targets for data collection. Bottom: using "bowl" and "mug" as the targets for data collection.
Figure 3: Tactile images from before and after a tactile regrasp (TR). Left: the contact surface area of the plate fills half of the tactile image, indicating a shallow grasp. Right: after a regrasp, the contact surface area has increased, indicating a stable grasp.
Figure 4: Visualization of contact surface region in the tactile image of grasps of various objects: (a) a red metal plate, (b) a red metal mug, (c) a red metal bowl and (d) a green wheat straw plate.
Figure 5: Sequence of images from roll outs of place policies trained using data collected with PvP. The policies are able to place objects of varying properties in the scene using images from the wrist camera directly.
...and 2 more figures

Working Backwards: Learning to Place by Picking

TL;DR

Abstract

Working Backwards: Learning to Place by Picking

Authors

TL;DR

Abstract

Table of Contents

Figures (7)