Table of Contents
Fetching ...

ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook

Wangtao Sun, Xuanqing Yu, Shizhu He, Jun Zhao, Kang Liu

TL;DR

ExpNote tackles the challenge of adapting black-box LLMs to downstream tasks by introducing an automated Experience Notebook that stores task-specific insights in a dynamic external memory. The framework trains the LLM to generate and store experiences with minimal ground-truth feedback, then retrieves relevant experiences during testing to condition predictions, using memory-interaction commands like THINK, NOTE, and RECALL. Empirical results across CLUTRR, METS-CoV, EMOJI, and LETS show substantial gains over CoT and other baselines, with improvements correlating to the availability of both positive and negative experiences and to retrieval effectiveness. The approach enables effective task solving for black-box LLMs without annotated data, offering a practical pathway toward robust, memory-augmented reasoning in real-world applications, though it may be less effective for highly case-specific or creative tasks.

Abstract

Black-box Large Language Models (LLMs) have shown great power in solving various tasks and are considered general problem solvers. However, LLMs still fail in many specific tasks although understand the task instruction. In this paper, we focus on the problem of boosting the ability of black-box LLMs to solve downstream tasks. We propose ExpNote, an automated framework to help LLMs better adapt to unfamiliar tasks through reflecting and noting experiences from training data and retrieving them from external memory during testing. We evaluate ExpNote on multiple tasks and the experimental results demonstrate that the proposed method significantly improves the performance of black-box LLMs. The data and code are available at https://github.com/forangel2014/ExpNote

ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook

TL;DR

ExpNote tackles the challenge of adapting black-box LLMs to downstream tasks by introducing an automated Experience Notebook that stores task-specific insights in a dynamic external memory. The framework trains the LLM to generate and store experiences with minimal ground-truth feedback, then retrieves relevant experiences during testing to condition predictions, using memory-interaction commands like THINK, NOTE, and RECALL. Empirical results across CLUTRR, METS-CoV, EMOJI, and LETS show substantial gains over CoT and other baselines, with improvements correlating to the availability of both positive and negative experiences and to retrieval effectiveness. The approach enables effective task solving for black-box LLMs without annotated data, offering a practical pathway toward robust, memory-augmented reasoning in real-world applications, though it may be less effective for highly case-specific or creative tasks.

Abstract

Black-box Large Language Models (LLMs) have shown great power in solving various tasks and are considered general problem solvers. However, LLMs still fail in many specific tasks although understand the task instruction. In this paper, we focus on the problem of boosting the ability of black-box LLMs to solve downstream tasks. We propose ExpNote, an automated framework to help LLMs better adapt to unfamiliar tasks through reflecting and noting experiences from training data and retrieving them from external memory during testing. We evaluate ExpNote on multiple tasks and the experimental results demonstrate that the proposed method significantly improves the performance of black-box LLMs. The data and code are available at https://github.com/forangel2014/ExpNote
Paper Structure (19 sections, 4 equations, 8 figures, 3 tables)

This paper contains 19 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An illustration of how ExpNote assists LLM in enhancing the effectiveness of task-solving. ExpNote can automatically generalize relevant experiences from other samples and apply them to specific tasks.
  • Figure 2: The framework of ExpNote. This framework shows how LLMs use ExpNote to solve specific tasks, including the training (left) and testing (right) stages.
  • Figure 3: The training curve in CLUTRR dataset.
  • Figure 4: Improvement analysis of ExpNote on 4 datasets.
  • Figure 5: The example trajectories of ExpNote on the CLUTRR Dataset. The left part is a training case while the right part is a corresponding testing case using that training experience. The blue parts are the ExpNote demonstrations (prompts $P_{train}$/$P_{test}$). The yellow parts are the interactive trajectories between LLM and ExpNote. The sentence highlighted in green is the learned experience.
  • ...and 3 more figures