Table of Contents
Fetching ...

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

Rocktim Jyoti Das, Harsh Singh, Diana Turmakhan, Muhammad Abdullah Sohail, Mingfei Han, Preslav Nakov, Fabio Pizzati, Ivan Laptev

TL;DR

BLAZER tackles the data scarcity in robotics by bootstrapping LLM-based manipulation agents through automatically generated, simulator-verified demonstrations. By producing executable manipulation commands $\mathcal{C}_\tau$ for tasks, validating them with a verifier $V$, and assembling successful examples into $\mathcal{D}_{\text{BLAZER}}$ for supervised finetuning of a lightweight LLM, the framework achieves zero-human-supervision and real-world transfer via a vision pipeline that estimates scene states $\tilde{\Sigma}_\mathcal{E}$. The approach yields strong zero-shot performance and enables downscaling of model size, with $\text{LLaMA-8B}$ trained with BLAZER outperforming its larger teacher $\text{LLaMA-70B}$ on RLBench tasks and transferring to real-world manipulation. Extensive experiments in simulation and on a real 7-DOF Panda robot with RGB-D perception demonstrate improved generalization to both in-distribution and out-of-distribution tasks, confirming the practical impact of simulator-driven data generation for robotics. The work suggests a path toward scalable, self-improving robotic policies that operate without manual data curation and can run on compact models, with future directions including incorporating negative samples and preference-based fine-tuning.

Abstract

Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

TL;DR

BLAZER tackles the data scarcity in robotics by bootstrapping LLM-based manipulation agents through automatically generated, simulator-verified demonstrations. By producing executable manipulation commands for tasks, validating them with a verifier , and assembling successful examples into for supervised finetuning of a lightweight LLM, the framework achieves zero-human-supervision and real-world transfer via a vision pipeline that estimates scene states . The approach yields strong zero-shot performance and enables downscaling of model size, with trained with BLAZER outperforming its larger teacher on RLBench tasks and transferring to real-world manipulation. Extensive experiments in simulation and on a real 7-DOF Panda robot with RGB-D perception demonstrate improved generalization to both in-distribution and out-of-distribution tasks, confirming the practical impact of simulator-driven data generation for robotics. The work suggests a path toward scalable, self-improving robotic policies that operate without manual data curation and can run on compact models, with future directions including incorporating negative samples and preference-based fine-tuning.

Abstract

Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.

Paper Structure

This paper contains 17 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: BLAZER overview. Previous approaches such as Code As Policies (CAP) Liang2022CodeAP (left) use LLMs to produce interaction plans and to solve manipulation tasks in a zero-shot manner. Such methods rely on careful prompt engineering and often lead to suboptimal performance. In contrast, BLAZER (right) uses a fully automatic pipeline, where successful LLM-generated demonstrations are used to train improved LLM-based manipulation agents with no manual supervision.
  • Figure 2: Overview of BLAZER. Given a set of manipulation tasks $\tau\in\mathcal{T}$, we use LLM to automatically generate executable commands $\mathcal{C}_\tau$ for solving $\tau$. The resulting solutions are automatically verified by executing $\mathcal{C}_\tau$ in a simulator and successful solutions are added to the task database $\mathcal{D}_{\tau}$. Task databases for all training tasks $\mathcal{T}$ are merged into $\mathcal{D}_\text{BLAZER}$ and are used for supervised finetuning of BLAZER LLM.
  • Figure 3: Tasks in simulation. We consider 9 pick-and-place tasks from RLBench simulator James2019RLBenchTR. For each task we display a starting condition (top) and the desired final state (bottom).
  • Figure 4: Tasks in simulation using visual observations. We evaluate task success rate for LLaMA-70B, LLaMA-8B and LLaMA-8B w/ BLAZER in simulation using our vision pipeline that assuming no ground truth knowledge about object states. Consistently with results in Table \ref{['tab:task_comparison']}, BLAZER outperform other methods.
  • Figure 5: Real world results. We compare LLaMA-8B with BLAZER against LLaMA-70B on real-world tasks depicted in (\ref{['fig:tasks-real-world-pic']}). From quantitative results in (\ref{['tab:tasks-real-world-quant']}), we outperform the baseline, both on In-distribution tasks (similar to $\mathcal{T}$) and Out-of-distribution tasks, showcasing the generalization capability of BLAZER.
  • ...and 3 more figures