Table of Contents
Fetching ...

Complex Instruction Following with Diverse Style Policies in Football Games

Chenglu Sun, Shuo Shen, Haonan Hu, Wei Zhou, Chen Chen

TL;DR

This work tackles the difficulty of commanding multi-agent football policies with high-level natural language by introducing Language-Controlled Diverse Style Policies (LCDSP). LCDSP combines Diverse Style Training (DST), which trains a single policy to exhibit a wide range of behaviors controlled by a continuous Style Parameter vector $\boldsymbol{\omega}$, with a Style Interpreter (SI) that rapidly translates NL instructions into SP via an Adaptive Style-adjustment Block. DST employs reward shaping across ten agent behaviors and a Prioritized Style Sampling (PSS) strategy to efficiently explore the SP space, while SI grounds NL-to-SP mapping in a frozen pretrained language model and per-behavior adaptive scaling. Extensive experiments in a 5v5 Google Research Football environment demonstrate LCDSP’s ability to follow abstract tactical instructions, generalize across instruction mappings, and provide fine-grained behavioral control, supported by quantitative analyses of DST (SEU, SMUL, SELO) and SI (MAE, inference time). The approach advances practical NL-grounded control for complex, multi-agent tasks with potential real-world impact in sports analytics and autonomous coordination domains.

Abstract

Despite advancements in language-controlled reinforcement learning (LC-RL) for basic domains and straightforward commands (e.g., object manipulation and navigation), effectively extending LC-RL to comprehend and execute high-level or abstract instructions in complex, multi-agent environments, such as football games, remains a significant challenge. To address this gap, we introduce Language-Controlled Diverse Style Policies (LCDSP), a novel LC-RL paradigm specifically designed for complex scenarios. LCDSP comprises two key components: a Diverse Style Training (DST) method and a Style Interpreter (SI). The DST method efficiently trains a single policy capable of exhibiting a wide range of diverse behaviors by modulating agent actions through style parameters (SP). The SI is designed to accurately and rapidly translate high-level language instructions into these corresponding SP. Through extensive experiments in a complex 5v5 football environment, we demonstrate that LCDSP effectively comprehends abstract tactical instructions and accurately executes the desired diverse behavioral styles, showcasing its potential for complex, real-world applications.

Complex Instruction Following with Diverse Style Policies in Football Games

TL;DR

This work tackles the difficulty of commanding multi-agent football policies with high-level natural language by introducing Language-Controlled Diverse Style Policies (LCDSP). LCDSP combines Diverse Style Training (DST), which trains a single policy to exhibit a wide range of behaviors controlled by a continuous Style Parameter vector , with a Style Interpreter (SI) that rapidly translates NL instructions into SP via an Adaptive Style-adjustment Block. DST employs reward shaping across ten agent behaviors and a Prioritized Style Sampling (PSS) strategy to efficiently explore the SP space, while SI grounds NL-to-SP mapping in a frozen pretrained language model and per-behavior adaptive scaling. Extensive experiments in a 5v5 Google Research Football environment demonstrate LCDSP’s ability to follow abstract tactical instructions, generalize across instruction mappings, and provide fine-grained behavioral control, supported by quantitative analyses of DST (SEU, SMUL, SELO) and SI (MAE, inference time). The approach advances practical NL-grounded control for complex, multi-agent tasks with potential real-world impact in sports analytics and autonomous coordination domains.

Abstract

Despite advancements in language-controlled reinforcement learning (LC-RL) for basic domains and straightforward commands (e.g., object manipulation and navigation), effectively extending LC-RL to comprehend and execute high-level or abstract instructions in complex, multi-agent environments, such as football games, remains a significant challenge. To address this gap, we introduce Language-Controlled Diverse Style Policies (LCDSP), a novel LC-RL paradigm specifically designed for complex scenarios. LCDSP comprises two key components: a Diverse Style Training (DST) method and a Style Interpreter (SI). The DST method efficiently trains a single policy capable of exhibiting a wide range of diverse behaviors by modulating agent actions through style parameters (SP). The SI is designed to accurately and rapidly translate high-level language instructions into these corresponding SP. Through extensive experiments in a complex 5v5 football environment, we demonstrate that LCDSP effectively comprehends abstract tactical instructions and accurately executes the desired diverse behavioral styles, showcasing its potential for complex, real-world applications.

Paper Structure

This paper contains 46 sections, 14 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the inference process of LCDSP.
  • Figure 2: Overview of the DST process. For a given training scenario, we first design a reward shaping scheme for key agent behaviors, establishing how SP modulate the reward signal. Before each episode, the style generator samples a set of SP $\boldsymbol{\omega}$, which influence the rewards received for executing associated behaviors. The policy takes the current environment state $s_t$ and SP $\boldsymbol{\omega}$ as input to produce action $a_t$. The environment returns the modulated reward $r_t^\omega$ and the next state $s_{t+1}$. These experiences are used by the DST process to update the RL policy.
  • Figure 3: Overview of the SI module. User instructions are input into a PLM to obtain their representations. These instruction representations are then processed by an additional network to output logits. Concurrently, the introductions of main agent behaviors are also processed by the same PLM to obtain their representations. These behavior representations are fed into the ASaB to generate adaptive scaling parameters ($\gamma_i, \beta_i$) for each style parameter $i$. Finally, the SP are generated by applying the ASaB transformation to the initial outputs to get the final SP.
  • Figure 4: Comparison of in-game metrics under different tactical instructions. The tests were conducted using self-play matches, where the opponents' SP were randomly generated, and each instruction were tested ten times. Both the Positive Attack and All-out Attack tactics result in a higher number of goals and shot attempts compared to the Balanced Play tactic. However, the All-out Attack tactic exhibits a lower win rate due to a higher number of conceded goals and fewer draws, attributable to its overly aggressive style. The Counter Attack tactic shows a higher draw rate; to facilitate counter-attacks, the formation is positioned deeper, creating a larger space between forwards and defenders. The Park the Bus tactic features more compact spacing, leading to a very high draw rate. The Tiki-Taka tactic achieves the highest possession ratio and number of pass attempts, with closer spacing facilitating short passes.
  • Figure 5: Fine-grained adjustment of SP and their corresponding changes in in-game metrics for different methods.
  • ...and 8 more figures