Table of Contents
Fetching ...

Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?

Ziqi Ma, Sao Mai Nguyen, Philippe Xu

TL;DR

This work addresses the challenge of translating human natural-language instructions into emergent symbolic representations that arise within a developmental agent trained by hierarchical reinforcement learning. Using the STAR framework to derive symbolic partitions in the Ant Maze and Ant Fall tasks, the authors evaluate four LLMs with graph-based prompt designs and measure translation accuracy via G-BLEU, revealing that language-to-symbol alignment is feasible at coarse granularity but highly sensitive to partition detail and task complexity. The study finds pervasive limitations in current LLMs’ ability to reliably align natural language with internal agent representations, with substantial variability across instructions and partitions, and notable difficulties when environmental tools (e.g., moving blocks) alter dynamics. The results motivate future work on grounding and alignment—potentially via Vision-Language-Action models—to enable more reliable, interpretable interactions between humans and developmental agents, particularly for safety-critical applications.

Abstract

Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different internal symbolic partitions generated by a hierarchical reinforcement learning algorithm in the Ant Maze and Ant Fall environments. Our findings reveal that although LLMs demonstrate some ability to translate natural language into a symbolic representation of the environment dynamics, their performance is highly sensitive to partition granularity and task complexity. The results expose limitations in current LLMs capacity for representation alignment, highlighting the need for further research on robust alignment between language and internal agent representations.

Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?

TL;DR

This work addresses the challenge of translating human natural-language instructions into emergent symbolic representations that arise within a developmental agent trained by hierarchical reinforcement learning. Using the STAR framework to derive symbolic partitions in the Ant Maze and Ant Fall tasks, the authors evaluate four LLMs with graph-based prompt designs and measure translation accuracy via G-BLEU, revealing that language-to-symbol alignment is feasible at coarse granularity but highly sensitive to partition detail and task complexity. The study finds pervasive limitations in current LLMs’ ability to reliably align natural language with internal agent representations, with substantial variability across instructions and partitions, and notable difficulties when environmental tools (e.g., moving blocks) alter dynamics. The results motivate future work on grounding and alignment—potentially via Vision-Language-Action models—to enable more reliable, interpretable interactions between humans and developmental agents, particularly for safety-critical applications.

Abstract

Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different internal symbolic partitions generated by a hierarchical reinforcement learning algorithm in the Ant Maze and Ant Fall environments. Our findings reveal that although LLMs demonstrate some ability to translate natural language into a symbolic representation of the environment dynamics, their performance is highly sensitive to partition granularity and task complexity. The results expose limitations in current LLMs capacity for representation alignment, highlighting the need for further research on robust alignment between language and internal agent representations.

Paper Structure

This paper contains 17 sections, 3 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a)(d) Environments, (b)(e) Average success rate of STAR (from Zadem2024), (c)(f) Partition into regions of STAR, in respectively Ant Maze and Ant Fall. The regions in (c)(f) are the internal representation emerging during the training at timestamps noted in (b) and (e). The red point represents the initial position of the robot while the yellow point represents the goal position. Our LLM-based system translates instructions to guide the robot (such as "go east to the end, turn north until past the wall and go west until the end"), into a sequence of traversed regions (for Partition II of AntMaze, the output is 5 → 11 → 2 → 3 → 4).
  • Figure 2: G-BLEU scores for partition-agnostic instructions tested in (a) Ant Maze, and (b) Ant Fall before block and (c) Ant Fall after block. For each internal representation, we plot in blue the average and standard deviation of 10 queries for each instruction, and boxplot in brown, orange and yellow the average and IQR over the 11 instructions.
  • Figure 3: G-BLEU scores for partition-associated instructions tested for each internal representation in (a) Ant Maze, (b) Ant Fall before block and (c) Ant Fall after block. For each tested internal representation, we plot in green the average and standard deviation of 10 queries for instruction from each person, and we show the median and the IQR over the 11 persons by boxplot.
  • Figure 4: G-BLEU scores for Instructions specific to the simplest partition applying across all partitions in (a) Ant Maze task, (b) Ant Fall task, before block, (c) Ant Fall task, after block.
  • Figure 5: Instructions specific to the most complex partition applying across all partitions in (a) Ant Maze task, (b) Ant Fall task, before block, (c) Ant Fall task, after block.