Table of Contents
Fetching ...

Co-jump: Cooperative Jumping with Quadrupedal Robots via Multi-Agent Reinforcement Learning

Shihao Dong, Yeke Chen, Zeren Luo, Jiahui Zhang, Bowen Xu, Jinghan Lin, Yimin Han, Ji Ma, Zhiyou Yu, Yudong Zhao, Peng Lu

TL;DR

This work tackles the problem of exceeding solo actuation limits in quadrupeds by enabling two robots to cooperatively jump via proprioception-only control. It introduces a decentralized execution framework built on Multi-Agent Proximal Policy Optimization ($MAPPO$) with centralized training and a four-stage curriculum, addressing credit assignment and high-dynamics in tightly coupled tasks. The approach achieves robust sim-to-real transfer, enabling 1.5 m platform jumps and 1.1 m foot-end elevations (a 144% improvement over a single robot) without external sensing or explicit communication, and demonstrates both forward and lateral maneuvers including a forward flip. The results establish a foundation for communication-free multi-robot locomotion in constrained environments and highlight the importance of curriculum-guided learning and domain randomization for real-world deployment.

Abstract

While single-agent legged locomotion has witnessed remarkable progress, individual robots remain fundamentally constrained by physical actuation limits. To transcend these boundaries, we introduce Co-jump, a cooperative task where two quadrupedal robots synchronize to execute jumps far beyond their solo capabilities. We tackle the high-impulse contact dynamics of this task under a decentralized setting, achieving synchronization without explicit communication or pre-specified motion primitives. Our framework leverages Multi-Agent Proximal Policy Optimization (MAPPO) enhanced by a progressive curriculum strategy, which effectively overcomes the sparse-reward exploration challenges inherent in mechanically coupled systems. We demonstrate robust performance in simulation and successful transfer to physical hardware, executing multi-directional jumps onto platforms up to 1.5 m in height. Specifically, one of the robots achieves a foot-end elevation of 1.1 m, which represents a 144% improvement over the 0.45 m jump height of a standalone quadrupedal robot, demonstrating superior vertical performance. Notably, this precise coordination is achieved solely through proprioceptive feedback, establishing a foundation for communication-free collaborative locomotion in constrained environments.

Co-jump: Cooperative Jumping with Quadrupedal Robots via Multi-Agent Reinforcement Learning

TL;DR

This work tackles the problem of exceeding solo actuation limits in quadrupeds by enabling two robots to cooperatively jump via proprioception-only control. It introduces a decentralized execution framework built on Multi-Agent Proximal Policy Optimization () with centralized training and a four-stage curriculum, addressing credit assignment and high-dynamics in tightly coupled tasks. The approach achieves robust sim-to-real transfer, enabling 1.5 m platform jumps and 1.1 m foot-end elevations (a 144% improvement over a single robot) without external sensing or explicit communication, and demonstrates both forward and lateral maneuvers including a forward flip. The results establish a foundation for communication-free multi-robot locomotion in constrained environments and highlight the importance of curriculum-guided learning and domain randomization for real-world deployment.

Abstract

While single-agent legged locomotion has witnessed remarkable progress, individual robots remain fundamentally constrained by physical actuation limits. To transcend these boundaries, we introduce Co-jump, a cooperative task where two quadrupedal robots synchronize to execute jumps far beyond their solo capabilities. We tackle the high-impulse contact dynamics of this task under a decentralized setting, achieving synchronization without explicit communication or pre-specified motion primitives. Our framework leverages Multi-Agent Proximal Policy Optimization (MAPPO) enhanced by a progressive curriculum strategy, which effectively overcomes the sparse-reward exploration challenges inherent in mechanically coupled systems. We demonstrate robust performance in simulation and successful transfer to physical hardware, executing multi-directional jumps onto platforms up to 1.5 m in height. Specifically, one of the robots achieves a foot-end elevation of 1.1 m, which represents a 144% improvement over the 0.45 m jump height of a standalone quadrupedal robot, demonstrating superior vertical performance. Notably, this precise coordination is achieved solely through proprioceptive feedback, establishing a foundation for communication-free collaborative locomotion in constrained environments.
Paper Structure (37 sections, 9 equations, 7 figures, 4 tables)

This paper contains 37 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The proposed framework enables cooperative leaping, allowing robots to traverse terrains far beyond the capability of a single robot. (a) Cooperative leaping behavior inspired by the "Flying Goral" shen2016flying; (b) forward Co-jump in simulation; (c) physical realization of Co-jump using two quadrupedal robots to reach a 1.5 m platform; (d) sequence diagram of a front aerial flip; (e) sequence diagram of a side Co-jump achieving a height of 1.5 m.
  • Figure 2: Overview of the Co-jump framework. (a)MAPPO architecture with independent policy and value networks for Robot L and Robot J. (b) Four-stage curriculum learning progression: gravity, target, initialization, and delay curricula. (c) Real-world implementation of Co-jump on quadrupedal robots.
  • Figure 3: Illustration of the tolerance function $f_{\text{tol}}$. The reward is maximized within the bounds $[b_l, b_u]$ and decays smoothly based on the margin $m$ and value $v$tao2022learninga.
  • Figure 4: Simulation results of different jumping maneuvers, with Robot J jumping onto a $1.5\,\mathrm{m}$-high platform: (a) forward jump, and (b) side jump. These behaviors emerge from the progressive curriculum described in Section \ref{['curriculum']}, (c)3D trajectories of successful jumps across varying target heights and yaw orientations. (d) UMAP embedding of joint command sequences, revealing distinct clusters corresponding to different maneuver types in the policy's latent action space.
  • Figure 5: Ablation study results comparing different training configurations: (a) core curriculum training, (b) gravity curriculum ablation, (c) initialization curriculum ablation.
  • ...and 2 more figures