Table of Contents
Fetching ...

Synchronized Dual-arm Rearrangement via Cooperative mTSP

Wenhao Li, Shishun Zhang, Sisi Dai, Hui Huang, Ruizhen Hu, Xiaohong Chen, Kai Xu

TL;DR

This work formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution, and designed an attention-based network to effectively combine them and provide rational task scheduling.

Abstract

Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency.

Synchronized Dual-arm Rearrangement via Cooperative mTSP

TL;DR

This work formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution, and designed an attention-based network to effectively combine them and provide rational task scheduling.

Abstract

Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency.
Paper Structure (15 sections, 2 equations, 3 figures, 2 tables)

This paper contains 15 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Our algorithm comprises an RL-based task allocator and a cost predictor. When given a rearrangement task, we translate it into a task state graph and employ the cost predictor to calculate the cooperative cost matrix. Both the state graph and cost matrix are provided to the task allocator to determine current action. The action generated at each step is used to update the task state graph for the next iteration, continuing until the rearrangement task is completed and the final action sequence is obtained.
  • Figure 2: We formalize the rearrangement task into the cooperative mTSP with a task state graph and a cooperative cost matrix dynamically updated with interactions. The state graph contains spatial information between tasks and agents, while the cost matrix represents the cost of all potential joint actions for two agents at the current state. Each time dual arms execute a joint action, the state graph is updated accordingly, and the cost matrix is recalculated.
  • Figure 3: Our network consists of a node encoder, a coop encoder, and an action generator. The node encoder processes the state graph to encode spatial information, while the coop encoder takes the cost matrix as input to encode cooperative information. The action generator combines both encoded features and generates a probability map for all potential actions.