Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?
Ziqi Ma, Sao Mai Nguyen, Philippe Xu
TL;DR
This work addresses the challenge of translating human natural-language instructions into emergent symbolic representations that arise within a developmental agent trained by hierarchical reinforcement learning. Using the STAR framework to derive symbolic partitions in the Ant Maze and Ant Fall tasks, the authors evaluate four LLMs with graph-based prompt designs and measure translation accuracy via G-BLEU, revealing that language-to-symbol alignment is feasible at coarse granularity but highly sensitive to partition detail and task complexity. The study finds pervasive limitations in current LLMs’ ability to reliably align natural language with internal agent representations, with substantial variability across instructions and partitions, and notable difficulties when environmental tools (e.g., moving blocks) alter dynamics. The results motivate future work on grounding and alignment—potentially via Vision-Language-Action models—to enable more reliable, interpretable interactions between humans and developmental agents, particularly for safety-critical applications.
Abstract
Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different internal symbolic partitions generated by a hierarchical reinforcement learning algorithm in the Ant Maze and Ant Fall environments. Our findings reveal that although LLMs demonstrate some ability to translate natural language into a symbolic representation of the environment dynamics, their performance is highly sensitive to partition granularity and task complexity. The results expose limitations in current LLMs capacity for representation alignment, highlighting the need for further research on robust alignment between language and internal agent representations.
