AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Wanpeng Zhang; Zongqing Lu

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Wanpeng Zhang, Zongqing Lu

TL;DR

AdaRefiner tackles the difficulty of adapting LLM-driven decision-making to complex RL tasks without heavy prompt engineering or full fine-tuning. It introduces a lightweight Adapter LM that ingests RL feedback and environmental context to generate task-tailored prompts for a Decision LLM, creating a closed loop with a PPO-based RL agent. Across 22 Crafter tasks, AdaRefiner outperforms strong LLM-based and RL baselines and drives the agent toward higher-level, common-sense behaviors. The results demonstrate that lightweight, adaptive refinement of LLM guidance can significantly improve generalization and sample efficiency in open-world decision-making scenarios.

Abstract

Large Language Models (LLMs) have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter Language Model (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

TL;DR

Abstract

Paper Structure (34 sections, 1 equation, 7 figures, 9 tables, 1 algorithm)

This paper contains 34 sections, 1 equation, 7 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Methodology
Problem Formulation
Key Idea and Overall Framework
Adapter LM
Training Procedure
Experiment
Experiment Settings
Baselines
Results and Analysis
Ablation Study
Guidance and Agent Behaviors
Consistent Increment of Performance and Agent's Comprehension.
Behavior Statistics
...and 19 more sections

Figures (7)

Figure 1: Core differences between AdaRefiner (right) and typical LLM-based methods (left). The key distinction is the integration of Adapter LM, which enhances the synergy between LLMs and adaptive feedback.
Figure 2: Overall framework of AdaRefiner. In addition to receiving inputs from the environment and historical information, the prompt of the Adapter LM incorporates a comprehension score. This score computes the semantic similarity between the agent's recent actions and the sub-goals suggested by the LLM, determining whether the agent currently comprehends the LLM's guidance accurately. Through the agent's feedback and continuously fine-tuning the Adapter LM, we can keep the LLM always attuned to the actual circumstances of the task. This, in turn, ensures that the provided guidance is the most appropriate for the agents' prioritized learning.
Figure 3: Success rates of unlocking $22$ different achievements in log scale. AdaRefiner outperforms the two top-performing baselines. Notably, AdaRefiner is the only method that successfully completes the level-$7$ tasks "Make Iron Pickaxe" and "Make Iron Sword".
Figure 4: (left) Frames from an episode in the game, the order is from top left to bottom right. (right) The probabilities of actions in the agent's policy corresponding to each frame.
Figure 5: Learning curve (left) and comprehension score (right) of AdaRefiner.
...and 2 more figures

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

TL;DR

Abstract

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (7)