Table of Contents
Fetching ...

Dual-Quadruped Collaborative Transportation in Narrow Environments via Safe Reinforcement Learning

Zhezhi Lei, Zhihai Bi, Wenxin Wang, Jun Ma

TL;DR

This work tackles safe, decentralized collaborative payload transport by framing dual-quadruped coordination as a fully cooperative constrained Markov game with a shared safety budget $u$. It introduces cost-advantage decomposition to enable stable, monotonic improvement under shared constraints and a constraint allocation mechanism that distributes budgets among robots, guided by Bayesian optimization and a Lagrangian-based training loop. The approach uses two separate critics for reward and cost, trust-region updates, and KL bounds to ensure safety while promoting collaboration, demonstrated through simulations and real-world tests (gate, corridor, forest). Results show superior safety (lower collision probability), efficient collaboration (straighter, shorter trajectories), and adaptive formation reconfiguration in narrow environments, outperforming baseline cost-aware and reward-only methods. The framework provides a practical pathway for reliable multi-robot transportation in constrained settings with distributed control and explicit safety guarantees.

Abstract

Collaborative transportation, where multiple robots collaboratively transport a payload, has garnered significant attention in recent years. While ensuring safe and high-performance inter-robot collaboration is critical for effective task execution, it is difficult to pursue in narrow environments where the feasible region is extremely limited. To address this challenge, we propose a novel approach for dual-quadruped collaborative transportation via safe reinforcement learning (RL). Specifically, we model the task as a fully cooperative constrained Markov game, where collision avoidance is formulated as constraints. We introduce a cost-advantage decomposition method that enforces the sum of team constraints to remain below an upper bound, thereby guaranteeing task safety within an RL framework. Furthermore, we propose a constraint allocation method that assigns shared constraints to individual robots to maximize the overall task reward, encouraging autonomous task-assignment among robots, thereby improving collaborative task performance. Simulation and real-time experimental results demonstrate that the proposed approach achieves superior performance and a higher success rate in dual-quadruped collaborative transportation compared to existing methods.

Dual-Quadruped Collaborative Transportation in Narrow Environments via Safe Reinforcement Learning

TL;DR

This work tackles safe, decentralized collaborative payload transport by framing dual-quadruped coordination as a fully cooperative constrained Markov game with a shared safety budget . It introduces cost-advantage decomposition to enable stable, monotonic improvement under shared constraints and a constraint allocation mechanism that distributes budgets among robots, guided by Bayesian optimization and a Lagrangian-based training loop. The approach uses two separate critics for reward and cost, trust-region updates, and KL bounds to ensure safety while promoting collaboration, demonstrated through simulations and real-world tests (gate, corridor, forest). Results show superior safety (lower collision probability), efficient collaboration (straighter, shorter trajectories), and adaptive formation reconfiguration in narrow environments, outperforming baseline cost-aware and reward-only methods. The framework provides a practical pathway for reliable multi-robot transportation in constrained settings with distributed control and explicit safety guarantees.

Abstract

Collaborative transportation, where multiple robots collaboratively transport a payload, has garnered significant attention in recent years. While ensuring safe and high-performance inter-robot collaboration is critical for effective task execution, it is difficult to pursue in narrow environments where the feasible region is extremely limited. To address this challenge, we propose a novel approach for dual-quadruped collaborative transportation via safe reinforcement learning (RL). Specifically, we model the task as a fully cooperative constrained Markov game, where collision avoidance is formulated as constraints. We introduce a cost-advantage decomposition method that enforces the sum of team constraints to remain below an upper bound, thereby guaranteeing task safety within an RL framework. Furthermore, we propose a constraint allocation method that assigns shared constraints to individual robots to maximize the overall task reward, encouraging autonomous task-assignment among robots, thereby improving collaborative task performance. Simulation and real-time experimental results demonstrate that the proposed approach achieves superior performance and a higher success rate in dual-quadruped collaborative transportation compared to existing methods.
Paper Structure (18 sections, 28 equations, 11 figures, 2 tables)

This paper contains 18 sections, 28 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: An overview of our method. 1) Task Environment: The robot team is required to collaboratively transport an object from the initial region to the target region while avoiding collisions. 2) Proposed Framework: We formulate the task as a constrained Markov decision process. For the total reward of the team, we adopt a safe RL approach to sequentially estimate the surrogate returns of R1 and R2, thereby determining the contribution of each member to the overall reward. For the total cost of the team, we employ a constraint allocation method to assign individual cost budgets to each team member. Robots R1 and R2 execute independent policies to accomplish the task in a fully distributed manner. 3) Training Strategy: We use two separate critics to estimate the expected reward and cost, respectively. We adopt a Lagrangian approach in which the total advantage is computed by combining the reward and cost advantages. Then, the residual between the cost and the budget is used to update the Lagrange multiplier. Lastly, the updated policy is subsequently deployed in the simulation environment to collect data for the next iteration.
  • Figure 2: Average episode reward (left) and cost (right) during training. Our method (red curve) achieves high reward while maintaining a low cost, suggesting that it effectively balances safety and efficiency.
  • Figure 3:
  • Figure 4:
  • Figure 5:
  • ...and 6 more figures