Complex Instruction Following with Diverse Style Policies in Football Games
Chenglu Sun, Shuo Shen, Haonan Hu, Wei Zhou, Chen Chen
TL;DR
This work tackles the difficulty of commanding multi-agent football policies with high-level natural language by introducing Language-Controlled Diverse Style Policies (LCDSP). LCDSP combines Diverse Style Training (DST), which trains a single policy to exhibit a wide range of behaviors controlled by a continuous Style Parameter vector $\boldsymbol{\omega}$, with a Style Interpreter (SI) that rapidly translates NL instructions into SP via an Adaptive Style-adjustment Block. DST employs reward shaping across ten agent behaviors and a Prioritized Style Sampling (PSS) strategy to efficiently explore the SP space, while SI grounds NL-to-SP mapping in a frozen pretrained language model and per-behavior adaptive scaling. Extensive experiments in a 5v5 Google Research Football environment demonstrate LCDSP’s ability to follow abstract tactical instructions, generalize across instruction mappings, and provide fine-grained behavioral control, supported by quantitative analyses of DST (SEU, SMUL, SELO) and SI (MAE, inference time). The approach advances practical NL-grounded control for complex, multi-agent tasks with potential real-world impact in sports analytics and autonomous coordination domains.
Abstract
Despite advancements in language-controlled reinforcement learning (LC-RL) for basic domains and straightforward commands (e.g., object manipulation and navigation), effectively extending LC-RL to comprehend and execute high-level or abstract instructions in complex, multi-agent environments, such as football games, remains a significant challenge. To address this gap, we introduce Language-Controlled Diverse Style Policies (LCDSP), a novel LC-RL paradigm specifically designed for complex scenarios. LCDSP comprises two key components: a Diverse Style Training (DST) method and a Style Interpreter (SI). The DST method efficiently trains a single policy capable of exhibiting a wide range of diverse behaviors by modulating agent actions through style parameters (SP). The SI is designed to accurately and rapidly translate high-level language instructions into these corresponding SP. Through extensive experiments in a complex 5v5 football environment, we demonstrate that LCDSP effectively comprehends abstract tactical instructions and accurately executes the desired diverse behavioral styles, showcasing its potential for complex, real-world applications.
