Should You Use Your Large Language Model to Explore or Exploit?

Keegan Harris; Aleksandrs Slivkins

Should You Use Your Large Language Model to Explore or Exploit?

Keegan Harris, Aleksandrs Slivkins

TL;DR

The paper tackles how to leverage large language models for the explore-exploit tradeoff in decision-making, focusing on contextual bandits. It systematically evaluates multiple LLMs as exploitation and exploration oracles across MAB/CB puzzles and large action spaces, including QA and arXiv-based tasks, with various prompting strategies and mitigations. The findings show current LLMs struggle with exploitation on non-trivial tasks and often underperform simple linear baselines, while they can substantially aid exploration by proposing semantically meaningful candidate actions in high-dimensional spaces. This establishes a clear boundary for LLM utility in decision pipelines and points to future work on tool-enabled exploitation and smarter, semantically guided exploration.

Abstract

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

Should You Use Your Large Language Model to Explore or Exploit?

TL;DR

Abstract

Should You Use Your Large Language Model to Explore or Exploit?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (39)