Table of Contents
Fetching ...

Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots

Francesca Bray, Simone Tolomei, Andrei Cramariuc, Cesar Cadena, Marco Hutter

Abstract

Robotic collaborative carrying could greatly benefit human activities like warehouse and construction site management. However, coordinating the simultaneous motion of multiple robots represents a significant challenge. Existing works primarily focus on obstacle-free environments, making them unsuitable for most real-world applications. Works that account for obstacles, either overfit to a specific terrain configuration or rely on pre-recorded maps combined with path planners to compute collision-free trajectories. This work focuses on two quadrupedal robots mechanically connected to a carried object. We propose a Reinforcement Learning (RL)-based policy that enables tracking a commanded velocity direction while avoiding collisions with nearby obstacles using only onboard sensing, eliminating the need for precomputed trajectories and complete map knowledge. Our work presents a hierarchical architecture, where a perceptive high-level object-centric policy commands two pretrained locomotion policies. Additionally, we employ a game-inspired curriculum to increase the complexity of obstacles in the terrain progressively. We validate our approach on two quadrupedal robots connected to a bar via spherical joints, benchmarking it against optimization-based and decentralized RL baselines. Our hardware experiments demonstrate the ability of our system to locomote in unknown environments without the need for a map or a path planner. The video of our work is available in the multimedia material.

Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots

Abstract

Robotic collaborative carrying could greatly benefit human activities like warehouse and construction site management. However, coordinating the simultaneous motion of multiple robots represents a significant challenge. Existing works primarily focus on obstacle-free environments, making them unsuitable for most real-world applications. Works that account for obstacles, either overfit to a specific terrain configuration or rely on pre-recorded maps combined with path planners to compute collision-free trajectories. This work focuses on two quadrupedal robots mechanically connected to a carried object. We propose a Reinforcement Learning (RL)-based policy that enables tracking a commanded velocity direction while avoiding collisions with nearby obstacles using only onboard sensing, eliminating the need for precomputed trajectories and complete map knowledge. Our work presents a hierarchical architecture, where a perceptive high-level object-centric policy commands two pretrained locomotion policies. Additionally, we employ a game-inspired curriculum to increase the complexity of obstacles in the terrain progressively. We validate our approach on two quadrupedal robots connected to a bar via spherical joints, benchmarking it against optimization-based and decentralized RL baselines. Our hardware experiments demonstrate the ability of our system to locomote in unknown environments without the need for a map or a path planner. The video of our work is available in the multimedia material.
Paper Structure (27 sections, 8 equations, 7 figures, 2 tables)

This paper contains 27 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: System deployed for hardware experiments. Each quadrupedal robot is provided with a mount ending with a spherical joint connected to a bar of 2 m length. The task-relevant frames are highlighted.
  • Figure 2: Policy architecture. Single-robot feasible paths are sampled from the terrain, and the direction of the next waypoint defines the high-level command $^{hl}_{obj}\mathbf{c}$. Together with system-level observations, $^{hl}_{obj}\mathbf{o}$, these are provided to the high-level policy, which outputs high-level actions $^{hl}_{obj}\mathbf{a}$. These represent the rotated low-level commands $^{ll_i}_{base_i}\mathbf{c}$, which are executed by the trained locomotion policies.
  • Figure 3: Example of a generated terrain used for training. The leftmost picture corresponds to the easiest curriculum levels, while the rightmost represents the more challenging ones.
  • Figure 4: Trajectory comparison in the Boxes scenario. From left to right: MAPPO, Optimization-based (local map, $n_{samples} = 1500$), Ours, and Optimization-based (full map, $n_{samples} = 1500$). The black line denotes the object path, colored lines represent the agents. Red stars indicate intermediate waypoints.
  • Figure 5: Waypoints-following on empty terrain experiment. Top: system configuration at the start and after reaching two predefined waypoints (red stars). Bottom: real-time exteroceptive observations, overlaid with the trajectories of the agents' bases and the position of the world frame.
  • ...and 2 more figures