Table of Contents
Fetching ...

Test-Driven Agentic Framework for Reliable Robot Controller

Shivanshu Tripathi, Reza Akbarian Bafghi, Maziar Raissi

TL;DR

A test-driven, agentic framework for synthesizing a deployable low-level robot controller for navigation tasks that iteratively refines the generated controller code using diagnostic feedback from structured test suites to achieve task success.

Abstract

In this work, we present a test-driven, agentic framework for synthesizing a deployable low-level robot controller for navigation tasks. Given a 2D map with an image of an ultrasonic sensor-based robot, or a 3D robotic simulation environment, our framework iteratively refines the generated controller code using diagnostic feedback from structured test suites to achieve task success. We propose a dual-tier repair strategy to refine the generated code that alternates between prompt-level refinement and direct code editing. We evaluate the approach across 2D navigation tasks and 3D navigation in the Webots simulator. Experimental results show that test-driven synthesis substantially improves controller reliability and robustness over one-shot controller generation, especially when the initial prompt is underspecified. The source code and demonstration videos are available at: https://shivanshutripath.github.io/robotic_controller.github.io.

Test-Driven Agentic Framework for Reliable Robot Controller

TL;DR

A test-driven, agentic framework for synthesizing a deployable low-level robot controller for navigation tasks that iteratively refines the generated controller code using diagnostic feedback from structured test suites to achieve task success.

Abstract

In this work, we present a test-driven, agentic framework for synthesizing a deployable low-level robot controller for navigation tasks. Given a 2D map with an image of an ultrasonic sensor-based robot, or a 3D robotic simulation environment, our framework iteratively refines the generated controller code using diagnostic feedback from structured test suites to achieve task success. We propose a dual-tier repair strategy to refine the generated code that alternates between prompt-level refinement and direct code editing. We evaluate the approach across 2D navigation tasks and 3D navigation in the Webots simulator. Experimental results show that test-driven synthesis substantially improves controller reliability and robustness over one-shot controller generation, especially when the initial prompt is underspecified. The source code and demonstration videos are available at: https://shivanshutripath.github.io/robotic_controller.github.io.
Paper Structure (19 sections, 1 equation, 7 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 1 equation, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: The figure shows map preprocessing pipeline. A user-provided map image is converted into a grayscale occupancy grid $occ[y,x]$, where $occ[y,x]=\texttt{True}$ denotes an obstacle pixel and $occ[y,x]=\texttt{False}$ denotes free space. This pipeline simultaneously generates params.json, which stores the configuration parameters to produce occupancy grid.
  • Figure 2: The figure shows the overview of our test-driven agentic framework. The workflow is partitioned in to three functional domains: (1) The red block shows the inputs to code_gen.py, providing preprocessed occupancy maps with params.json and the robot image; (2) The blue block highlights the calibration loop, where code_gen.py generates controller.py from prompt.py. PyTest suite evaluates the generated controller, and repair.py updates the prompt template or updates the code based on the failures; (3) The green block defines the PyTest files, which are run via repair.py.
  • Figure 3: The figure shows the inputs to code_gen.py in the $3$D Webots simulation. The generator (LLM$_1$) produces controller.py conditioned on (i) the current prompt specification prompt.py (fixed base prompt plus accumulated updates), (ii) the Webots world file empty.wbt describing the arena, robot, goal, and static obstacles, and (iii) a human task prompt specifying start and goal poses. The optimizer (LLM$_2$) uses Pytest feedback to update prompt.py across iterations.
  • Figure 4: The figure shows the performance of our approach in $2$D navigation described in Section \ref{['subsec:2d_setup']}. The left figure shows the success rate across learner models. The right figure shows cumulative success vs iteration $k$.
  • Figure 5: The figure shows the controllers performance for a $3$D navigation task in a Webots environment described in Section \ref{['3D-Webots']}. The left figure shows the success rate across learner models. The right figure shows cumulative success vs iteration $k$.
  • ...and 2 more figures