Table of Contents
Fetching ...

SEO: Stochastic Experience Optimization for Large Language Models

Jitao Xu, Hongyun Zhou, Lei Shen, Conghui Zhu, Jin Huang, Yitao Duan

TL;DR

SEO addresses the challenge of improving LLM task performance without updating model parameters by learning model-specific experiences in natural language. It employs a stochastic, three-stage loop—Trial Generation, Experience Update, and Experience Validation—driven by a generator LLM and an optimizer LLM, using a stochastic validation set to guide updates. Across multi-hop QA, MT, and text classification, SEO yields consistent improvements and demonstrates generalization to out-of-distribution data and cross-language transfer, while remaining model- and task-agnostic. The approach highlights the power of explicit validation in NL optimization and reveals emergent, transferable rules as experiences evolve. Overall, SEO offers a lightweight, flexible mechanism to tailor LLM behavior to specific tasks without expensive fine-tuning or parameter updates.

Abstract

Large Language Models (LLMs) can benefit from useful experiences to improve their performance on specific tasks. However, finding helpful experiences for different LLMs is not obvious, since it is unclear what experiences suit specific LLMs. Previous studies intended to automatically find useful experiences using LLMs, while it is difficult to ensure the effectiveness of the obtained experience. In this paper, we propose Stochastic Experience Optimization (SEO), an iterative approach that finds optimized model-specific experience without modifying model parameters through experience update in natural language. In SEO, we propose a stochastic validation method to ensure the update direction of experience, avoiding unavailing updates. Experimental results on three tasks for three LLMs demonstrate that experiences optimized by SEO can achieve consistently improved performance. Further analysis indicates that SEO-optimized experience can generalize to out-of-distribution data, boosting the performance of LLMs on similar tasks.

SEO: Stochastic Experience Optimization for Large Language Models

TL;DR

SEO addresses the challenge of improving LLM task performance without updating model parameters by learning model-specific experiences in natural language. It employs a stochastic, three-stage loop—Trial Generation, Experience Update, and Experience Validation—driven by a generator LLM and an optimizer LLM, using a stochastic validation set to guide updates. Across multi-hop QA, MT, and text classification, SEO yields consistent improvements and demonstrates generalization to out-of-distribution data and cross-language transfer, while remaining model- and task-agnostic. The approach highlights the power of explicit validation in NL optimization and reveals emergent, transferable rules as experiences evolve. Overall, SEO offers a lightweight, flexible mechanism to tailor LLM behavior to specific tasks without expensive fine-tuning or parameter updates.

Abstract

Large Language Models (LLMs) can benefit from useful experiences to improve their performance on specific tasks. However, finding helpful experiences for different LLMs is not obvious, since it is unclear what experiences suit specific LLMs. Previous studies intended to automatically find useful experiences using LLMs, while it is difficult to ensure the effectiveness of the obtained experience. In this paper, we propose Stochastic Experience Optimization (SEO), an iterative approach that finds optimized model-specific experience without modifying model parameters through experience update in natural language. In SEO, we propose a stochastic validation method to ensure the update direction of experience, avoiding unavailing updates. Experimental results on three tasks for three LLMs demonstrate that experiences optimized by SEO can achieve consistently improved performance. Further analysis indicates that SEO-optimized experience can generalize to out-of-distribution data, boosting the performance of LLMs on similar tasks.
Paper Structure (29 sections, 4 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of a training step of the SEO process. We first gather trials with and without experience ($\hat{Y}_{E_t}$ and $\hat{Y}$, respectively) using a generator model $M_{gen}$ and compute experience effect $\delta_{E_t}$. We then use an optimizer model $M_{opt}$ to sample refined candidate experiences $E'_i$ by taking the training example $X$ and $Y$, the trials $\hat{Y}_{E_t}$ and $\hat{Y}$, the experience $E_t$ and the effect of experience $\delta_{E_t}$ as input. Candidate experiences $E'_i$ are then validated on a stochastic set $D_t$ to select the best experience that surpasses the current experience $E_t$ for next iteration.
  • Figure 2: Prompt used for the optimizer model to generate candidate experiences for MT. Contents in curly braces are variables.
  • Figure 3: Differences of COMET scores when evaluating experience across language directions for Llama-2-7b. Scores in each column is calculated by subtracting the COMET score of initial experience for the corresponding direction. The diagonal reports improvement of applying experience on its own direction.