In-Context Learning Enables Robot Action Prediction in LLMs

Yida Yin; Zekai Wang; Yuvan Sharma; Dantong Niu; Trevor Darrell; Roei Herzig

In-Context Learning Enables Robot Action Prediction in LLMs

Yida Yin, Zekai Wang, Yuvan Sharma, Dantong Niu, Trevor Darrell, Roei Herzig

TL;DR

RoboPrompt tackles the problem of predicting robot actions from observations without training by leveraging in-context learning in off-the-shelf text-only LLMs. It builds ICL demonstrations from textual encodings of discretized 6-DoF poses and observed actions, derived from keyframes and a pose estimator, and uses a structured prompt to induce action predictions at test time. Across RLBench simulations and real-robot experiments, RoboPrompt outperforms zero-shot and some ICL baselines, while remaining competitive with supervised methods on simpler tasks and highlighting the potential of LLM-driven robotics with minimal data requirements. The work demonstrates that careful demonstration selection, pose-text representations, and prompt structuring can transfer LLM reasoning capabilities to direct 6-DoF action prediction, offering a data-efficient alternative for manipulation in static environments.

Abstract

Recently, Large Language Models (LLMs) have achieved remarkable success using in-context learning (ICL) in the language domain. However, leveraging the ICL capabilities within LLMs to directly predict robot actions remains largely unexplored. In this paper, we introduce RoboPrompt, a framework that enables off-the-shelf text-only LLMs to directly predict robot actions through ICL without training. Our approach first heuristically identifies keyframes that capture important moments from an episode. Next, we extract end-effector actions from these keyframes as well as the estimated initial object poses, and both are converted into textual descriptions. Finally, we construct a structured template to form ICL demonstrations from these textual descriptions and a task instruction. This enables an LLM to directly predict robot actions at test time. Through extensive experiments and analysis, RoboPrompt shows stronger performance over zero-shot and ICL baselines in simulated and real-world settings. Our project page is available at https://davidyyd.github.io/roboprompt.

In-Context Learning Enables Robot Action Prediction in LLMs

TL;DR

Abstract

In-Context Learning Enables Robot Action Prediction in LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)