RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models
Abhinav Jain, Chris Jermaine, Vaibhav Unhelkar
TL;DR
The paper addresses the challenge of learning from past interactions in goal-driven robotic tasks with partial observability by augmenting LLM-based agents with an interaction memory and a bank of critics. RAG-Modulo retrieves relevant past interactions as in-context examples and incorporates feedback from syntax, semantics, and low-level executability critics to guide decision-making without gradient updates. The authors introduce a memory-based retrieval mechanism using cosine similarity to populate prompts with informative exemplars and demonstrate superior performance on AlfWorld and BabyAI benchmarks, achieving higher success rates and more efficient planning than strong baselines. This work highlights data-efficient learning for long-horizon robotic tasks and suggests paths toward real-world deployment and integration with continual learning frameworks.
Abstract
Large language models (LLMs) have recently emerged as promising tools for solving challenging robotic tasks, even in the presence of action and observation uncertainties. Recent LLM-based decision-making methods (also referred to as LLM-based agents), when paired with appropriate critics, have demonstrated potential in solving complex, long-horizon tasks with relatively few interactions. However, most existing LLM-based agents lack the ability to retain and learn from past interactions - an essential trait of learning-based robotic systems. We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions. The memory component allows the agent to automatically retrieve and incorporate relevant past experiences as in-context examples, providing context-aware feedback for more informed decision-making. Further by updating its memory, the agent improves its performance over time, thereby exhibiting learning. Through experiments in the challenging BabyAI and AlfWorld domains, we demonstrate significant improvements in task success rates and efficiency, showing that the proposed RAG-Modulo framework outperforms state-of-the-art baselines.
