Table of Contents
Fetching ...

LLaMA Rider: Spurring Large Language Models to Explore the Open World

Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, Zongqing Lu

TL;DR

This work tackles grounding large language models in open-world environments by enabling experiential learning through a two-stage framework named LLaMA-Rider. It combines a feedback-revision driven exploration loop with subtask relabeling to collect diverse experiences, then uses supervised fine-tuning with a compact dataset (about $1.3k$ instances) via QLoRA to improve multi-task planning in Minecraft. Empirical results in MineDojo show enhanced exploration efficiency and competitive performance against stronger LLM baselines, with notable gains in generalization to novel tasks and high data efficiency compared to RL approaches. The study highlights lifelong, environment-grounded learning potential while acknowledging limitations in maximizing environmental information utilization and outlining directions to broaden applicability to other open-world domains.

Abstract

Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.

LLaMA Rider: Spurring Large Language Models to Explore the Open World

TL;DR

This work tackles grounding large language models in open-world environments by enabling experiential learning through a two-stage framework named LLaMA-Rider. It combines a feedback-revision driven exploration loop with subtask relabeling to collect diverse experiences, then uses supervised fine-tuning with a compact dataset (about instances) via QLoRA to improve multi-task planning in Minecraft. Empirical results in MineDojo show enhanced exploration efficiency and competitive performance against stronger LLM baselines, with notable gains in generalization to novel tasks and high data efficiency compared to RL approaches. The study highlights lifelong, environment-grounded learning potential while acknowledging limitations in maximizing environmental information utilization and outlining directions to broaden applicability to other open-world domains.

Abstract

Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.
Paper Structure (30 sections, 2 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 2 figures, 8 tables, 1 algorithm.

Figures (2)

  • Figure 1: Spurring LLaMA to explore the open world.
  • Figure 2: Overview of LLaMA-Rider. The framework consists of two stages. In the exploration stage, the LLM explores to accomplish tasks with the help of the feedback-revision mechanism and subtask relabeling. In the learning stage, the collected trajectories are formatted into a supervised dataset to finetune the LLM.