RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces

Yuhang Zhang; Jinming Ma; Feng Wu

RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces

Yuhang Zhang, Jinming Ma, Feng Wu

TL;DR

This work trains a reinforcement learning agent in a simulator to manipulate deformable objects for surface wiping tasks by simplifying the state representation of object surfaces using harmonic UV mapping, process contact feedback from the simulator on 2D feature maps, and use scaled grouped convolutions to extract features efficiently.

Abstract

Currently, manipulation tasks for deformable objects often focus on activities like folding clothes, handling ropes, and manipulating bags. However, research on contact-rich tasks involving deformable objects remains relatively underdeveloped. When humans use cloth or sponges to wipe surfaces, they rely on both vision and tactile feedback. Yet, current algorithms still face challenges with issues like occlusion, while research on tactile perception for manipulation is still evolving. Tasks such as covering surfaces with deformable objects demand not only perception but also precise robotic manipulation. To address this, we propose a method that leverages efficient and accessible simulators for task execution. Specifically, we train a reinforcement learning agent in a simulator to manipulate deformable objects for surface wiping tasks. We simplify the state representation of object surfaces using harmonic UV mapping, process contact feedback from the simulator on 2D feature maps, and use scaled grouped convolutions (SGCNN) to extract features efficiently. The agent then outputs actions in a reduced-dimensional action space to generate coverage paths. Experiments demonstrate that our method outperforms previous approaches in key metrics, including total path length and coverage area. We deploy these paths on a Kinova Gen3 manipulator to perform wiping experiments on the back of a torso model, validating the feasibility of our approach.

RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces

TL;DR

Abstract

Paper Structure (18 sections, 9 equations, 8 figures, 2 tables)

This paper contains 18 sections, 9 equations, 8 figures, 2 tables.

INTRODUCTION
RELATED WORK
Traditional Coverage Path Algorithms
Learning-Based Planning and Control Algorithms
Modeling and Manipulation of Deformable Objects
METHOD
Problem Formulation
UV Mapping
Observation Space
Action Space
Reward Function
Experiment
Implementation Details
Comparative Methods
Simulation Environment Training
...and 3 more sections

Figures (8)

Figure 1: Overview of Framework. We propose a framework to solve the problem of using deformable objects to cover 3D surfaces. 1) A 3D model of the target object is reconstructed, and a wiping task environment is created within Mujoco. 2) By employing harmonic UV mapping, we simplify the state representation and action space. 3) The reinforcement learning algorithm outputs an efficient coverage path to cover the surface.
Figure 2: The proposed method first maps the target wiping area to the UV coordinate system via UV mapping. To represent the state more efficiently, we construct an agent-centric map representation. The coverage map, border map, and frontier map are translated, rotated, and scaled relative to the agent’s perspective. These multi-scale maps are then processed by an SGCNN 10.5555/3692070.3692973 module, and control signals are finally output through fully-connected (FC) layers. Here shows two scales with a factor of 2. The agent is positioned at the center of the scaled maps. The scaled maps are discretized to the same resolution of $64\times64$ pixels.
Figure 3: Objects 1-10 are sourced from the SPONGE dataset le2023sponge and are used for quantitative analysis. The subsequent objects are employed for experiments on more complex geometries, including car doors, windows, and human body models.
Figure 4: Paths generated by different methods on Object 1,3,4. (a) Our method, (b) SPONGE method, (c) zigzag pattern, (d) spiral pattern.
Figure 5: The convergence behavior of the agent. The y-axis represents the number of environmental steps per episode (lower is better), while the x-axis represents total training steps.
...and 3 more figures

RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces

TL;DR

Abstract

RL-Based Coverage Path Planning for Deformable Objects on 3D Surfaces

Authors

TL;DR

Abstract

Table of Contents

Figures (8)