Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Tuomas Haarnoja; Ben Moran; Guy Lever; Sandy H. Huang; Dhruva Tirumala; Jan Humplik; Markus Wulfmeier; Saran Tunyasuvunakool; Noah Y. Siegel; Roland Hafner; Michael Bloesch; Kristian Hartikainen; Arunkumar Byravan; Leonard Hasenclever; Yuval Tassa; Fereshteh Sadeghi; Nathan Batchelor; Federico Casarini; Stefano Saliceti; Charles Game; Neil Sreendra; Kushal Patel; Marlon Gwira; Andrea Huber; Nicole Hurley; Francesco Nori; Raia Hadsell; Nicolas Heess

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess

TL;DR

The paper tackles learning agile, safe full-body control for a low-cost humanoid to play 1v1 soccer by combining a two-stage skill pretraining (get-up and soccer) with distillation and self-play. It demonstrates zero-shot sim-to-real transfer on Robotis OP3 using high-frequency control, domain randomization, and perturbations, yielding robust behaviors that outperform scripted baselines in walking, turning, getting up, and kicking. Emergent opponent-aware strategies, adaptive footwork, and long-horizon coordination are observed, along with modest real-world versus simulation gaps. Together, these results offer a practical blueprint for deploying capable humanoid agents in dynamic, multi-agent environments with limited hardware.

Abstract

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives.

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 14 figures, 6 tables)

This paper contains 15 sections, 7 equations, 14 figures, 6 tables.

Reliability and Sim-to-Real Analysis:
Opponent Awareness:
Adaptive Footwork:
Soccer Skill Training:
Get-Up Skill Training:
Distillation:
Self-Play:
Distributional MPO:
Policy Evaluation:
Policy Improvement:
Agent Hyperparameters:
Walking:
Get-Up Ability:
Kicking Speed:
Turning Speed:

Figures (14)

Figure 1: The robot soccer environment. We created matching simulated (left) and real (right) soccer environments. The pitch is 5m long by 4m wide. The real environment was also equipped with a motion capture (mocap) system for tracking the two robots and the ball.
Figure 2: Agent training setup. We trained agents in two stages. In the first stage (left), we train a separate soccer skill and get-up skill (\ref{['sec:teacher_training']}). In the second stage (right), we distill these two skills into a single agent that can both get up from the ground and play soccer (\ref{['sec:teacher_distillation_selfplay']}). The second stage also incorporates self-play: the opponent is uniformly randomly sampled from saved policy snapshots from earlier in training. We found that this two-stage approach leads to qualitatively better behavior and improved sim-to-real transfer, compared to training an agent from scratch for the full 1v1 soccer task.
Figure 3: Gallery of robot behaviors. Each row gives an example of a type of behavior that is observed when trained policies are deployed on real robots.
Figure 4: Joint angle embeddings. Embedding of the joint angles recorded while executing different policies, as described in \ref{['sec:behavior_embeddings']}. A: The embedding for the scripted baseline walking policy. B: The embedding for the soccer skill. C: The embedding for the full 1v1 agent.
Figure 5: Behavior analysis.A-C: Set pieces. Top rows: Example initializations for the set piece tasks in sim and on the real robot. Second rows: Overlayed plots of the 10 trajectories collected from the set piece experiments showing the robot trajectory before kicking (solid lines), after kicking (dotted lines), the ball trajectory (dashed lines), final ball position (white circle), final robot position (red-pink circles) and opponent position (blue circle). Each red-pink shade corresponds to one of the 10 trajectories. D: Adaptive footwork set piece. Right foot trajectory (orange), left foot trajectory (red), ball trajectory (white), point of the kick (yellow), and footsteps highlighted with dots. E: Turn-and-kick set piece. Right 3: a sequence of frames from the set piece. Left: a plot of the footsteps from the corresponding trajectory. The agent turned, walked approximately 2 m, turned, kicked, and finally balanced using 10 footsteps. Please refer to \ref{['sec:behavior_analysis']} for a discussion of these results.
...and 9 more figures

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

TL;DR

Abstract

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (14)