Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

Zichao Hu; Junyi Jessy Li; Arjun Guha; Joydeep Biswas

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

Zichao Hu, Junyi Jessy Li, Arjun Guha, Joydeep Biswas

TL;DR

Robo-Instruct tackles the data bottleneck for fine-tuning small robotics-focused LLMs by generating task-program pairs tested in on-the-fly simulations that enforce robot and environment constraints. It combines Self-Instruct task seeding, dynamic environment synthesis during program execution, and an LLM-aided instruction alignment to improve correspondence between natural language and robot programs. Empirical results on RoboEval show substantial improvements over Self-Instruct and Evol-Instruct baselines and competitiveness with larger models, while enabling local inference. The work contributes a practical framework for robust, constraint-aware robot code synthesis and suggests future integration with reinforcement learning for even greater realism and robustness.

Abstract

Code LLMs have shown promising results with converting tasks in natural language to programs that can be executed by service robots. We are interested in finetuning small, specialized LLMs for this purpose, but collecting datasets of task-program pairs specific to each robot is time-consuming and expensive. While approaches such as SELF-INSTRUCT and EVOL-INSTRUCT are capable of generating novel tasks given a few examples, they are unable to provide the corresponding programs that correctly abide by physical-world and robot-constraints using the provided programming interface. Using a simulator is a natural potential solution to checking for such constraints, but building simulation environments that can handle arbitrary tasks and their necessary objects and locations, is challenging. To address these challenges, we introduce ROBO-INSTRUCT, which synthesizes task-specific simulation environments on the fly during program execution, by opportunistically inferring entity properties and enforcing corresponding constraints based on how the entities are used in the task program. Additionally, ROBO-INSTRUCT integrates an LLM-aided post-processing procedure to refine instructions for better alignment with robot programs. We demonstrate the effectiveness of ROBO-INSTRUCT across multiple LLMs, showing that our fine-tuned models outperform all baseline methods and even match or surpass the performance of several larger and proprietary models.

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

TL;DR

Abstract

Paper Structure (35 sections, 16 figures, 4 tables, 2 algorithms)

This paper contains 35 sections, 16 figures, 4 tables, 2 algorithms.

Introduction
Robo-Instruct
Overall Framework
Verifying Candidate Programs Against Domain-specific Constraints
LLM-aided Instruction-Program Alignment Procedure
Analysis and Experiments
Experiments Setup
Is Robo-Instruct Effective at Generating Training Data to Fine-Tune a Small Language Model for Generating Domain-Specific Robot Programs?
Evaluating the Contributions of Robo-Instruct Components
Qualitative analysis of the generated program errors
Real-World Deployment Demo
Related Work
LLMs for Robot Code Generation
Generating Datasets For Fine-tuning LLMs
Relevance to Program Analysis
...and 20 more sections

Figures (16)

Figure 1: High-level overview of the Robo-Instruct framework. This figure also shows the pass@1 score performance of Robo-Instruct fine-tuned LLM compared to other LLMs on RoboEval.
Figure 2: Examples of programs violating domain-specific constraints.
Figure 3: Illustration of Robo-Instruct executing a task program while incrementally building the simulation environment. The environment starts with only the robot’s initial position (gray, step 0). As the program runs, it branches into two possible execution paths. To evaluate each path, two simulation environments are sampled (world 1 and world 2). In this example, the program fails because it attempts to pick up an apple that isn’t present.
Figure 4: Self-Instruct-Generated Program Errors. Examples highlight errors that violate domain-specific constraints.
Figure 5: Deployment of the Robo-Instruct fine-tuned model to generate programs based on user-provided instructions and execute them on the robot.
...and 11 more figures

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

TL;DR

Abstract

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (16)