ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models
Runyu Ma, Jelle Luijkx, Zlatan Ajanovic, Jens Kober
TL;DR
ExploRLLM fuses foundation-model-guided exploration with reinforcement learning to tackle sample-inefficient tabletop manipulation in robotics. It uses vision-language detections to reduce observation space and LLM-generated policy code to propose exploratory actions, while a residual RL agent refines outcomes to compensate for FMs' physical gaps. Across simulation and real-world experiments, ExploRLLM achieves faster convergence, higher success rates, and promising zero-shot sim-to-real transfer, outperforming FM-only and RL baselines. The approach generalizes to unseen colors and letters, reducing reliance on extensive real-world data and enabling more robust robotic manipulation.
Abstract
In robot manipulation, Reinforcement Learning (RL) often suffers from low sample efficiency and uncertain convergence, especially in large observation and action spaces. Foundation Models (FMs) offer an alternative, demonstrating promise in zero-shot and few-shot settings. However, they can be unreliable due to limited physical and spatial understanding. We introduce ExploRLLM, a method that combines the strengths of both paradigms. In our approach, FMs improve RL convergence by generating policy code and efficient representations, while a residual RL agent compensates for the FMs' limited physical understanding. We show that ExploRLLM outperforms both policies derived from FMs and RL baselines in table-top manipulation tasks. Additionally, real-world experiments show that the policies exhibit promising zero-shot sim-to-real transfer. Supplementary material is available at https://explorllm.github.io.
