Table of Contents
Fetching ...

LLM-Driven Corrective Robot Operation Code Generation with Static Text-Based Simulation

Wenhao Wang, Yi Rong, Yanyan Li, Long Jiao, Jiawei Yuan

TL;DR

This work tackles the reliability challenges of LLM-generated robot operation code by introducing a static text-based simulation that obviates the need for dynamic execution in physical experiments or full simulators. An LLM-driven pipeline generates code, statically simulates its execution to produce semantic observations, and a separate evaluator feeds back to refine the code through iterations. The approach achieves high trajectory-observation fidelity and competitive task success rates across UAV platforms and ground vehicles, with strong performance even without dynamic simulation, demonstrating practical viability and scalability. The study also analyzes the impact of system-prompt design on simulator effectiveness and validates the method in real-world deployments, confirming robustness and adaptability across configurations such as AirSim and Gazebo.

Abstract

Recent advances in Large language models (LLMs) have demonstrated their promising capabilities of generating robot operation code to enable LLM-driven robots. To enhance the reliability of operation code generated by LLMs, corrective designs with feedback from the observation of executing code have been increasingly adopted in existing research. However, the code execution in these designs relies on either a physical experiment or a customized simulation environment, which limits their deployment due to the high configuration effort of the environment and the potential long execution time. In this paper, we explore the possibility of directly leveraging LLM to enable static simulation of robot operation code, and then leverage it to design a new reliable LLM-driven corrective robot operation code generation framework. Our framework configures the LLM as a static simulator with enhanced capabilities that reliably simulate robot code execution by interpreting actions, reasoning over state transitions, analyzing execution outcomes, and generating semantic observations that accurately capture trajectory dynamics. To validate the performance of our framework, we performed experiments on various operation tasks for different robots, including UAVs and small ground vehicles. The experiment results not only demonstrated the high accuracy of our static text-based simulation but also the reliable code generation of our LLM-driven corrective framework, which achieves a comparable performance with state-of-the-art research while does not rely on dynamic code execution using physical experiments or simulators.

LLM-Driven Corrective Robot Operation Code Generation with Static Text-Based Simulation

TL;DR

This work tackles the reliability challenges of LLM-generated robot operation code by introducing a static text-based simulation that obviates the need for dynamic execution in physical experiments or full simulators. An LLM-driven pipeline generates code, statically simulates its execution to produce semantic observations, and a separate evaluator feeds back to refine the code through iterations. The approach achieves high trajectory-observation fidelity and competitive task success rates across UAV platforms and ground vehicles, with strong performance even without dynamic simulation, demonstrating practical viability and scalability. The study also analyzes the impact of system-prompt design on simulator effectiveness and validates the method in real-world deployments, confirming robustness and adaptability across configurations such as AirSim and Gazebo.

Abstract

Recent advances in Large language models (LLMs) have demonstrated their promising capabilities of generating robot operation code to enable LLM-driven robots. To enhance the reliability of operation code generated by LLMs, corrective designs with feedback from the observation of executing code have been increasingly adopted in existing research. However, the code execution in these designs relies on either a physical experiment or a customized simulation environment, which limits their deployment due to the high configuration effort of the environment and the potential long execution time. In this paper, we explore the possibility of directly leveraging LLM to enable static simulation of robot operation code, and then leverage it to design a new reliable LLM-driven corrective robot operation code generation framework. Our framework configures the LLM as a static simulator with enhanced capabilities that reliably simulate robot code execution by interpreting actions, reasoning over state transitions, analyzing execution outcomes, and generating semantic observations that accurately capture trajectory dynamics. To validate the performance of our framework, we performed experiments on various operation tasks for different robots, including UAVs and small ground vehicles. The experiment results not only demonstrated the high accuracy of our static text-based simulation but also the reliable code generation of our LLM-driven corrective framework, which achieves a comparable performance with state-of-the-art research while does not rely on dynamic code execution using physical experiments or simulators.

Paper Structure

This paper contains 33 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: An overview of LLM-driven corrective robot operation code generation with static text-based simulation.
  • Figure 2: An illustrative example of corrective code generation with text-based simulation. In the first iteration, the LLM-based simulator accurately produces an observation of UAV actions, while the evaluator identifies the mismatch and constructs feedback. Based on the feedback, the code generator corrects the mismatch and produces a valid code for robot operation in the second iteration.