DeLLMa: Decision Making Under Uncertainty with Large Language Models
Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger
TL;DR
DeLLMa introduces a structured, interpretable approach to decision making under uncertainty by embedding classical decision theory into an inference-time reasoning pipeline for LLMs. It separates the process into state enumeration, state forecasting, utility elicitation, and expected utility maximization, using LLMs to forecast latent states and to derive a utility function while performing the optimization offline. Across agriculture and finance tasks, DeLLMa yields up to 40% accuracy gains over zero-shot, self-consistency, and chain-of-thought baselines, with robust performance across multiple LLMs and demonstrated human auditability of intermediate reasoning. The framework emphasizes transparency and trust by exposing intermediate components (state forecasts, utilities) and shows scalable compute benefits, suggesting practical deployment for decision support under uncertainty. Overall, DeLLMa advances interpretable, probabilistic reasoning in LLMs and offers a pathway to broader applications in uncertain-domain decision making.”
Abstract
The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty. In this paper, we show that directly prompting LLMs on these types of decision-making problems can yield poor results, especially as the problem complexity increases. To aid in these tasks, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step reasoning procedure that integrates recent best practices in scaling inference-time reasoning, drawing upon principles from decision theory and utility theory, to provide an accurate and human-auditable decision-making process. We validate our procedure on multiple realistic decision-making environments, demonstrating that DeLLMa can consistently enhance the decision-making performance of leading language models, and achieve up to a 40% increase in accuracy over competing methods. Additionally, we show how performance improves when scaling compute at test time, and carry out human evaluations to benchmark components of DeLLMa.
