HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control

Yizhi Chen; Zheng Zhang; Zhanxiang Cao; Yihe Chen; Shengcheng Fu; Liyun Yan; Yang Zhang; Jiali Liu; Haoyang Li; Yue Gao

HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control

Yizhi Chen, Zheng Zhang, Zhanxiang Cao, Yihe Chen, Shengcheng Fu, Liyun Yan, Yang Zhang, Jiali Liu, Haoyang Li, Yue Gao

TL;DR

HierKick provides an effective hierarchical paradigm for robot control in complex environments, extendable to multi-time-scale tasks, with its modular design and skill reuse offering a new path for intelligent robot control.

Abstract

Controlling soccer robots involves multi-time-scale decision-making, which requires balancing long-term tactical planning and short-term motion execution. Traditional end-to-end reinforcement learning (RL) methods face challenges in complex dynamic environments. This paper proposes HierKick, a vision-guided soccer robot control framework based on dual-frequency hierarchical RL. The framework adopts a hierarchical control architecture featuring a 5 Hz high-level policy that integrates YOLOv8 for real-time detection and selects tasks via a coach model, and a pre-trained 50 Hz low-level controller for precise joint control. Through this architecture, the framework achieves the four steps of approaching, aligning, dribbling, and kicking. Experimental results show that the success rates of this framework are 95.2\% in IsaacGym, 89.8\% in Mujoco, and 80\% in the real world. HierKick provides an effective hierarchical paradigm for robot control in complex environments, extendable to multi-time-scale tasks, with its modular design and skill reuse offering a new path for intelligent robot control.

HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control

TL;DR

Abstract

Paper Structure (31 sections, 14 equations, 6 figures, 3 tables)

This paper contains 31 sections, 14 equations, 6 figures, 3 tables.

INTRODUCTION
RELATED WORK
Dynamic Locomotion on Humanoid Robots with RL
Learning-Based Robot Soccer Skills
METHOD
Problem Statement
High-Level Coach Policy
Observation Space
Action Space
Low-Level Locomotion Controller
Observation Space
Action Space
Multi-Stage Reward Mechanism
Stage 1 Approach Phase Reward
Stage 2 Alignment Phase Reward
...and 16 more sections

Figures (6)

Figure 1: Robot Trajectory Diagram. This diagram depicts the trajectory of the robot on the football field. Guided by the Coach model and based on the HierKick framework, the robot executes a series of actions, including Approach, Alignment, Dribble, and Shoot.
Figure 2: Hierarchical control framework of HierKick. This framework consists of three interconnected modules: (1) Perception, where a YOLOv8 detector running at 10 Hz processes visual input to determine the positions of the soccer ball and goal (2) High-level control, featuring a "Coach Policy" operating at 5 Hz, which generates acceleration commands through PPO reinforcement learning by fusing perception results and historical control data (last acceleration and velocity commands) (3) Low-level execution, where a "Motion Policy" running at 50 Hz takes inputs like velocity commands, gait phase, and proprioception to execute sequential soccer actions (Approach, Alignment, Dribble, Shoot), with training guided by a multi-stage reward mechanism.
Figure 3: Simulation and Real-World Environments. The environments for the football robot's kicking task, including both simulated and real-world setups, are used for algorithm development, verification, and deployment.
Figure 4: Success Rate Comparison. This graph denotes the comparison of task success rates across simulation environments and real-world deployment.
Figure 5: Horizontal Distance to Goal Center. Kick distance distributions under three conditions. HierKick achieves the smallest mean and most concentrated distribution, whereas Remove- and HierKick ($\mathbf{c}_{\text{prev}}\!\to\!\mathbf{v}_{\text{robot-ball}}$) experiment exhibit larger means. White diamonds denote means; error bars indicate one standard deviation. The dashed line at 0.5 m marks a reference threshold. Each condition has 5280 samples.
...and 1 more figures

HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control

TL;DR

Abstract

HierKick: Hierarchical Reinforcement Learning for Vision-Guided Soccer Robot Control

Authors

TL;DR

Abstract

Table of Contents

Figures (6)