Table of Contents
Fetching ...

Decentralized Navigation of a Cable-Towed Load using Quadrupedal Robot Team via MARL

Wen-Tse Chen, Minh Nguyen, Zhongyu Li, Guo Ning Sue, Koushil Sreenath

TL;DR

This work tackles scalable, real-time collaboration for cable-towed load navigation by a team of quadrupedal robots in cluttered environments. It introduces a unified, decentralized MARL planner trained with CTDE and a multi-stage curriculum, enabling one-to-twelve robot coordination while maintaining constant inference time. A three-tier hierarchy—global load planner, decentralized MARL planners, and MPC-based locomotion—enables long-horizon planning, local coordination, and robust motion execution, respectively. Key contributions include multi-stage MARL training with knowledge distillation to prevent forgetting, domain randomization for sim-to-real transfer, and demonstration of robust, adaptive behavior across varying loads and changing team sizes, both in simulation and real-world experiments.

Abstract

This work addresses the challenge of enabling a team of quadrupedal robots to collaboratively tow a cable-connected load through cluttered and unstructured environments while avoiding obstacles. Leveraging cables allows the multi-robot system to navigate narrow spaces by maintaining slack when necessary. However, this introduces hybrid physical interactions due to alternating taut and slack states, with computational complexity that scales exponentially as the number of agents increases. To tackle these challenges, we developed a scalable and decentralized system capable of dynamically coordinating a variable number of quadrupedal robots while managing the hybrid physical interactions inherent in the load-towing task. At the core of this system is a novel multi-agent reinforcement learning (MARL)-based planner, designed for decentralized coordination. The MARL-based planner is trained using a centralized training with decentralized execution (CTDE) framework, enabling each robot to make decisions autonomously using only local (ego) observations. To accelerate learning and ensure effective collaboration across varying team sizes, we introduce a tailored training curriculum for MARL. Experimental results highlight the flexibility and scalability of the framework, demonstrating successful deployment with one to four robots in real-world scenarios and up to twelve robots in simulation. The decentralized planner maintains consistent inference times, regardless of the team size. Additionally, the proposed system demonstrates robustness to environment perturbations and adaptability to varying load weights. This work represents a step forward in achieving flexible and efficient multi-legged robotic collaboration in complex and real-world environments.

Decentralized Navigation of a Cable-Towed Load using Quadrupedal Robot Team via MARL

TL;DR

This work tackles scalable, real-time collaboration for cable-towed load navigation by a team of quadrupedal robots in cluttered environments. It introduces a unified, decentralized MARL planner trained with CTDE and a multi-stage curriculum, enabling one-to-twelve robot coordination while maintaining constant inference time. A three-tier hierarchy—global load planner, decentralized MARL planners, and MPC-based locomotion—enables long-horizon planning, local coordination, and robust motion execution, respectively. Key contributions include multi-stage MARL training with knowledge distillation to prevent forgetting, domain randomization for sim-to-real transfer, and demonstration of robust, adaptive behavior across varying loads and changing team sizes, both in simulation and real-world experiments.

Abstract

This work addresses the challenge of enabling a team of quadrupedal robots to collaboratively tow a cable-connected load through cluttered and unstructured environments while avoiding obstacles. Leveraging cables allows the multi-robot system to navigate narrow spaces by maintaining slack when necessary. However, this introduces hybrid physical interactions due to alternating taut and slack states, with computational complexity that scales exponentially as the number of agents increases. To tackle these challenges, we developed a scalable and decentralized system capable of dynamically coordinating a variable number of quadrupedal robots while managing the hybrid physical interactions inherent in the load-towing task. At the core of this system is a novel multi-agent reinforcement learning (MARL)-based planner, designed for decentralized coordination. The MARL-based planner is trained using a centralized training with decentralized execution (CTDE) framework, enabling each robot to make decisions autonomously using only local (ego) observations. To accelerate learning and ensure effective collaboration across varying team sizes, we introduce a tailored training curriculum for MARL. Experimental results highlight the flexibility and scalability of the framework, demonstrating successful deployment with one to four robots in real-world scenarios and up to twelve robots in simulation. The decentralized planner maintains consistent inference times, regardless of the team size. Additionally, the proposed system demonstrates robustness to environment perturbations and adaptability to varying load weights. This work represents a step forward in achieving flexible and efficient multi-legged robotic collaboration in complex and real-world environments.

Paper Structure

This paper contains 43 sections, 4 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Overlaid snapshots of four quadrupedal robots navigating a cable-towed load through a narrow passage using the proposed decentralized reinforcement-learning-based planner in this work. Later frames are made more transparent to highlight progression. Completing this task requires real-time coordination among teammates to adjust cable tension and avoid obstacles while navigating to the goal. The same MARL-based decentralized policy is used to flexibly control different individual robots in different team sizes. This experiment demonstrates the adaptability of the proposed decentralized policy as one robot is removed partway through the task. More experiments are shown in the https://youtu.be/GkGldcfQi9k.
  • Figure 2: The proposed hierarchical robotic system consists of three main components: the global planner for the load, decentralized MARL planners, and locomotion controllers. The global planner handles long-horizon planning, generating a collision-free trajectory for the load in cluttered environments. The decentralized MARL planners operate independently on each robot, managing local multi-robot collaboration by generating velocity commands based on local observations and local goals. The locomotion controller, running at a higher frequency, computes joint torques for the quadruped robots using MPC, ensuring robust locomotion in response to environmental changes and terrain variations.
  • Figure 3: Design of the MARL-based decentralized planner and an illustration of the $i$-th robot's local occupancy grid map. The actor network processes multi-modal inputs to generate desired velocities for each robot. Inputs include vector states and ego-centric local occupancy grid maps, with a key design feature being their dimension-invariance to accommodate variable team sizes. The critic network processes the global state, formed by concatenating all local observations and privileged information such as values of domain randomization variables. While sharing the actor's architecture, the critic employs larger hidden layers to handle the complexity of the global state. The bottom figure illustrates the $i$-th robot’s ego-centric local occupancy grid maps, which include a load map indicating the load’s position, an obstacle map, showing the nearby obstacles, and a teammate map reflecting the positions of other robots. The $i$-th robot is always positioned at the center of the map and oriented upward.
  • Figure 4: The multi-stage training pipeline is designed to train the decentralized planner across varying team sizes. Starting with a single-robot scenario, the number of robots is gradually increased at each stage. The policy is trained using MAPPO, with the actor network's parameters initialized from the previous stage. This approach allows the network to generalize across varying team sizes. To prevent catastrophic forgetting, a multi-agent knowledge distillation loss is added to the actor’s loss function. The critic network is periodically reset and trained from scratch at the start of each stage.
  • Figure 5: A screenshot of the MuJoCo training environment depicts blue blocks as obstacles and walls, yellow blocks as robots, a brown box as the load, and red lines indicating the global planning path for the load. A red star marks the goal, while an orange star represents the local goal. The figure on the right displays the local occupancy grid map of the $i$-th robot, consisting of three ego-centric maps: the load map showing the load's position, the obstacles map reflecting nearby obstacles, and the teammate map indicating the positions of nearby robots. The $i$-th robot itself is excluded from these local occupancy maps.
  • ...and 7 more figures