Table of Contents
Fetching ...

A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation

Georgios Kamaras, Subramanian Ramamoorthy

TL;DR

An integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception uses likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which to approximately simulate the behaviour of each specific DLO.

Abstract

We present an integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception. Working with a parameterised set of DLOs, we use likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which we can approximately simulate the behaviour of each specific DLO. We use these posteriors for domain randomisation while training, in simulation, object-specific visuomotor policies (i.e. assuming only visual and proprioceptive sensory) for a DLO reaching task, using model-free reinforcement learning. We demonstrate the utility of this approach by deploying sim-trained DLO manipulation policies in the real world in a zero-shot manner, i.e. without any further fine-tuning. In this context, we evaluate the capacity of a prominent LFI method to perform fine classification over the parametric set of DLOs, using only visual and proprioceptive data obtained in a dynamic manipulation trajectory. We then study the implications of the resulting domain distributions in sim-based policy learning and real-world performance.

A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation

TL;DR

An integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception uses likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which to approximately simulate the behaviour of each specific DLO.

Abstract

We present an integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception. Working with a parameterised set of DLOs, we use likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which we can approximately simulate the behaviour of each specific DLO. We use these posteriors for domain randomisation while training, in simulation, object-specific visuomotor policies (i.e. assuming only visual and proprioceptive sensory) for a DLO reaching task, using model-free reinforcement learning. We demonstrate the utility of this approach by deploying sim-trained DLO manipulation policies in the real world in a zero-shot manner, i.e. without any further fine-tuning. In this context, we evaluate the capacity of a prominent LFI method to perform fine classification over the parametric set of DLOs, using only visual and proprioceptive data obtained in a dynamic manipulation trajectory. We then study the implications of the resulting domain distributions in sim-based policy learning and real-world performance.

Paper Structure

This paper contains 22 sections, 3 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Overview of our Real2Sim2Real framework (top). We perform LFI for the posterior distribution $\hat{p}$ over system parameters (Real2Sim). We use $\hat{p}$ to perform domain randomisation while training a PPO agent to perform a DLO reaching task. We deploy and evaluate our sim-trained policy in the real world (Sim2Real). We experiment with a DLO reaching task, which is an abstraction of more intricate physical interactions, such as whipping the top cube of a stack (left). We demonstrate strong object-centric agent adaptation relative to the underlying domain distribution $\hat{p}$. For example (bottom), an effective task policy should not drag the DLO on the table (a), but reach for the green target (cross) with a larger part of its body (b).
  • Figure 2: Overview of our policy rollout and trajectory perception method. In each policy step we extract keypoints from segmentation images. Once all rollouts are completed, for each keypoint trajectory we compute the cos-only kernel mean embeddings of each step's keypoints using the RKHS-net layer (BayesSim-RKHS). Margins colour coding indicates association between sample images and algorithm parts. Illustration inspired by antonova2022bayesian.
  • Figure 3: Inferred MoG posterior heatmaps and the induced domain samples when each MoG is used for DR. MoG component means are displayed in blue crosses and colorbar quantifies likelihood.
  • Figure 4: Learning curves of PPO agent training when performing DR using different domain distributions.
  • Figure 5: EEF trajectories during the deployment of $6$ policies trained using different domain distributions (col. 1-6) in real world (rows 1-4; $4$ real DLOs) and in simulation (row 5; median DLO). We repeat each deployment $4$ times and average the measured accumulation of commanded EEF translations along the $x$ and $z$ axes (commanded actions, $\langle dx, dz \rangle$), with shaded regions reporting standard deviation. Colorbars illustrate DLO keypoints pixel-distance to the target $d_t$, reversing reward function for $d_{\text{thresh}} = 1.5$. The rightmost column shows the averaged rewards per timestep for the respective row's trajectories.
  • ...and 2 more figures