Table of Contents
Fetching ...

Deformable Cluster Manipulation via Whole-Arm Policy Learning

Jayadeep Jacob, Wenzheng Zhang, Houston Warren, Paulo Borges, Tirthankar Bandyopadhyay, Fabio Ramos

TL;DR

This work tackles deformable cluster manipulation under occlusion by learning a model-free, multi-modal policy that operates with full-arm contact. It fuses segmented 3D point clouds with proprioceptive touch through a distributional state representation in RKHS using kernel mean embeddings, and it uses a context-agnostic occlusion reward to drive de-occlusion. Trained in massively parallel Isaac Gym simulations with domain randomization, the policy transfers zero-shot to real Kinova hardware equipped with a single RGB-D camera, aided by a robust real-vision pipeline. Across extensive ablations and real-world tests, the approach outperforms hand-crafted IK baselines and graph-based baselines, highlighting the value of global distributional features and proprioceptive contact cues for learned, whole-arm manipulation of deformable clusters. Limitations remain in perception quality and sim-to-real gaps, with future work aimed at enhanced perception, system identification, tactile sensing, and broader deployment to other deformable clearance tasks.

Abstract

Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics. Website: https://sites.google.com/view/dcmwap/

Deformable Cluster Manipulation via Whole-Arm Policy Learning

TL;DR

This work tackles deformable cluster manipulation under occlusion by learning a model-free, multi-modal policy that operates with full-arm contact. It fuses segmented 3D point clouds with proprioceptive touch through a distributional state representation in RKHS using kernel mean embeddings, and it uses a context-agnostic occlusion reward to drive de-occlusion. Trained in massively parallel Isaac Gym simulations with domain randomization, the policy transfers zero-shot to real Kinova hardware equipped with a single RGB-D camera, aided by a robust real-vision pipeline. Across extensive ablations and real-world tests, the approach outperforms hand-crafted IK baselines and graph-based baselines, highlighting the value of global distributional features and proprioceptive contact cues for learned, whole-arm manipulation of deformable clusters. Limitations remain in perception quality and sim-to-real gaps, with future work aimed at enhanced perception, system identification, tactile sensing, and broader deployment to other deformable clearance tasks.

Abstract

Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics. Website: https://sites.google.com/view/dcmwap/

Paper Structure

This paper contains 16 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview: (1) Segmented point clouds corresponding to the clustered deformable branches, clearance region, and the robot are captured from the scene. (2) Distribution embeddings representing global scene are generated via kernel mean operator. Additional features include neighbourhood point features, robot sensor metrics, and proprioceptive contact indicators. (3) RL training with domain randomised geometry and dynamics on Isaac Gym parallel simulator. (4) Inference & zero-shot transfer to the real world aided by Grounding DINO, SAM-HQ, and Cutie frameworks.
  • Figure 2: (a) Manual pruning of overhanging branches impeding power lines, from siebert2014survey (b) Our simulation setup with an L-system structure for the power line clearance scenario.
  • Figure 3: Terminal state poses: (a)(b): Simulation trajectory states showing the whole arm being utilised to shield the power line. (c)-(e): Similar real strategies using various arm links for clearance, executed on branches of different tree species.
  • Figure 4: Ablation showing the relevance of key feature groups in our multi-modal policy. (-) indicates the removal of the single specified group. Each data point is a trained policy with a varying noise level. Boxes indicate median and IQR.