Table of Contents
Fetching ...

Learning Soccer Skills for Humanoid Robots: A Progressive Perception-Action Framework

Jipeng Kong, Xinzhe Liu, Yuhang Lin, Jinrui Han, Sören Schwertfeger, Chenjia Bai, Xuelong Li

TL;DR

The paper addresses robust humanoid soccer skill learning by integrating perception and action through PAiD, a progressive three-stage framework. It advances Stage I motion-skill acquisition via human motion tracking, Stage II lightweight perception-guided generalization, and Stage III physics-aware sim-to-real transfer to bridge reality gaps, with policy optimization via PPO on a finite-horizon MDP. The authors introduce a unified motion-tracking approach with adaptive sampling, a lightweight perception-reward scheme for generalization, and a physics-informed DR and CMA-ES-based system identification to align ball dynamics across real and simulated environments, achieving $J(\theta)=\mathbb{E}[\sum_{t=0}^{T-1} \gamma^t r_t]$ optimization. On the Unitree G1, PAiD yields high-fidelity, human-like kicking, reporting static success of $91.3\%$ and rolling success of $71.9\%$, with strong real-world transfer and terrain robustness compared with baselines.

Abstract

Soccer presents a significant challenge for humanoid robots, demanding tightly integrated perception-action capabilities for tasks like perception-guided kicking and whole-body balance control. Existing approaches suffer from inter-module instability in modular pipelines or conflicting training objectives in end-to-end frameworks. We propose Perception-Action integrated Decision-making (PAiD), a progressive architecture that decomposes soccer skill acquisition into three stages: motion-skill acquisition via human motion tracking, lightweight perception-action integration for positional generalization, and physics-aware sim-to-real transfer. This staged decomposition establishes stable foundational skills, avoids reward conflicts during perception integration, and minimizes sim-to-real gaps. Experiments on the Unitree G1 demonstrate high-fidelity human-like kicking with robust performance under diverse conditions-including static or rolling balls, various positions, and disturbances-while maintaining consistent execution across indoor and outdoor scenarios. Our divide-and-conquer strategy advances robust humanoid soccer capabilities and offers a scalable framework for complex embodied skill acquisition. The project page is available at https://soccer-humanoid.github.io/.

Learning Soccer Skills for Humanoid Robots: A Progressive Perception-Action Framework

TL;DR

The paper addresses robust humanoid soccer skill learning by integrating perception and action through PAiD, a progressive three-stage framework. It advances Stage I motion-skill acquisition via human motion tracking, Stage II lightweight perception-guided generalization, and Stage III physics-aware sim-to-real transfer to bridge reality gaps, with policy optimization via PPO on a finite-horizon MDP. The authors introduce a unified motion-tracking approach with adaptive sampling, a lightweight perception-reward scheme for generalization, and a physics-informed DR and CMA-ES-based system identification to align ball dynamics across real and simulated environments, achieving optimization. On the Unitree G1, PAiD yields high-fidelity, human-like kicking, reporting static success of and rolling success of , with strong real-world transfer and terrain robustness compared with baselines.

Abstract

Soccer presents a significant challenge for humanoid robots, demanding tightly integrated perception-action capabilities for tasks like perception-guided kicking and whole-body balance control. Existing approaches suffer from inter-module instability in modular pipelines or conflicting training objectives in end-to-end frameworks. We propose Perception-Action integrated Decision-making (PAiD), a progressive architecture that decomposes soccer skill acquisition into three stages: motion-skill acquisition via human motion tracking, lightweight perception-action integration for positional generalization, and physics-aware sim-to-real transfer. This staged decomposition establishes stable foundational skills, avoids reward conflicts during perception integration, and minimizes sim-to-real gaps. Experiments on the Unitree G1 demonstrate high-fidelity human-like kicking with robust performance under diverse conditions-including static or rolling balls, various positions, and disturbances-while maintaining consistent execution across indoor and outdoor scenarios. Our divide-and-conquer strategy advances robust humanoid soccer capabilities and offers a scalable framework for complex embodied skill acquisition. The project page is available at https://soccer-humanoid.github.io/.
Paper Structure (29 sections, 4 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Humanoid learning soccer skills. (a) The robot performs goal-directed kicks referring to different motions conditioned on the ball position. (b) The robot achieves accurate shots while imitating styles of different professional players. (c) Stable and precise kicking on a grass field. (d) The robot can successfully kick a moving ball.
  • Figure 2: Overview of the Perception-Action integrated Decision-making (PAiD) framework. Our pipeline progressively acquires robust soccer skills through three stages: (1) Motion Tracking: We retarget diverse human kicking motions (Standard & Stylized) to the humanoid and train a unified tracking policy using adaptive sampling to master fundamental skills without perceptual noise. (2) Perception-Guided Kicking: We equip the policy with egocentric perception and task-specific rewards to generalize kicking skills to randomized static and rolling ball targets. (3) Physics-Aware Sim-to-Real Transfer: We bridge the reality gap by aligning simulation contact dynamics with real-world measurements (ball drop & rolling tests) and incorporating physics-guided observation noise. (4) Real-World Deployment We successfully deploy PAiD on the Unitree G1.
  • Figure 3: Comparison of the soccer ball’s physical behavior in the real world and in simulation after parameter identification. (a)–(b) compare ball drop experiments, while (c)–(d) compare rolling experiments.
  • Figure 4: Quantitative analysis of soccer shooting proficiency across the workspace. The heatmaps visualize the spatial distribution of success rates and kicking accuracy for both static ball scenarios (a, b) and dynamic rolling ball interception (c, d).
  • Figure 5: Real-world shooting tests with randomly placed soccer balls. (a)–(b) show results on hard ground and grass, with 30 trials per surface. (c)–(d) show tests on rolling balls, where hollow and solid markers denote the rolling start and end positions, respectively, with 10 trials per surface. Red and blue dots indicate success and failure.
  • ...and 4 more figures