Sensory-Motor Control with Large Language Models via Iterative Policy Refinement
Jônata Tyska Carvalho, Stefano Nolfi
TL;DR
The paper addresses enabling large language models (LLMs) to control embodied agents via direct continuous actuation, bypassing predefined motor primitives. It introduces a two-phase prompting workflow that first elicits a high-level control strategy, converts it into executable rules and Python code, and then iteratively refines the policy using environment feedback and sensory-motor data. Across Gymnasium tasks and MuJoCo Pendulum, the method yields optimal or near-optimal policies using open-weight models (GPT-oss:120b, Qwen2.5:72b); ablations show the importance of including sensory-motor data and structured prompts. The results demonstrate that open LLMs can serve as effective priors and reasoning engines for embodied control without fine-tuning, offering a path toward data-efficient, plug-and-play robot control and suggesting future directions like adaptive prompting and dual-model systems.
Abstract
We propose a method that enables large language models (LLMs) to control embodied agents through the generation of control policies that directly map continuous observation vectors to continuous action vectors. At the outset, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. The approach proves effective with relatively compact models such as GPT-oss:120b and Qwen2.5:72b. In most cases, it successfully identifies optimal or near-optimal solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.
