Should You Use Your Large Language Model to Explore or Exploit?
Keegan Harris, Aleksandrs Slivkins
TL;DR
The paper tackles how to leverage large language models for the explore-exploit tradeoff in decision-making, focusing on contextual bandits. It systematically evaluates multiple LLMs as exploitation and exploration oracles across MAB/CB puzzles and large action spaces, including QA and arXiv-based tasks, with various prompting strategies and mitigations. The findings show current LLMs struggle with exploitation on non-trivial tasks and often underperform simple linear baselines, while they can substantially aid exploration by proposing semantically meaningful candidate actions in high-dimensional spaces. This establishes a clear boundary for LLM utility in decision pipelines and points to future work on tool-enabled exploitation and smarter, semantically guided exploration.
Abstract
We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.
