Table of Contents
Fetching ...

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger

TL;DR

DeLLMa introduces a structured, interpretable approach to decision making under uncertainty by embedding classical decision theory into an inference-time reasoning pipeline for LLMs. It separates the process into state enumeration, state forecasting, utility elicitation, and expected utility maximization, using LLMs to forecast latent states and to derive a utility function while performing the optimization offline. Across agriculture and finance tasks, DeLLMa yields up to 40% accuracy gains over zero-shot, self-consistency, and chain-of-thought baselines, with robust performance across multiple LLMs and demonstrated human auditability of intermediate reasoning. The framework emphasizes transparency and trust by exposing intermediate components (state forecasts, utilities) and shows scalable compute benefits, suggesting practical deployment for decision support under uncertainty. Overall, DeLLMa advances interpretable, probabilistic reasoning in LLMs and offers a pathway to broader applications in uncertain-domain decision making.”

Abstract

The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty. In this paper, we show that directly prompting LLMs on these types of decision-making problems can yield poor results, especially as the problem complexity increases. To aid in these tasks, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step reasoning procedure that integrates recent best practices in scaling inference-time reasoning, drawing upon principles from decision theory and utility theory, to provide an accurate and human-auditable decision-making process. We validate our procedure on multiple realistic decision-making environments, demonstrating that DeLLMa can consistently enhance the decision-making performance of leading language models, and achieve up to a 40% increase in accuracy over competing methods. Additionally, we show how performance improves when scaling compute at test time, and carry out human evaluations to benchmark components of DeLLMa.

DeLLMa: Decision Making Under Uncertainty with Large Language Models

TL;DR

DeLLMa introduces a structured, interpretable approach to decision making under uncertainty by embedding classical decision theory into an inference-time reasoning pipeline for LLMs. It separates the process into state enumeration, state forecasting, utility elicitation, and expected utility maximization, using LLMs to forecast latent states and to derive a utility function while performing the optimization offline. Across agriculture and finance tasks, DeLLMa yields up to 40% accuracy gains over zero-shot, self-consistency, and chain-of-thought baselines, with robust performance across multiple LLMs and demonstrated human auditability of intermediate reasoning. The framework emphasizes transparency and trust by exposing intermediate components (state forecasts, utilities) and shows scalable compute benefits, suggesting practical deployment for decision support under uncertainty. Overall, DeLLMa advances interpretable, probabilistic reasoning in LLMs and offers a pathway to broader applications in uncertain-domain decision making.”

Abstract

The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty. In this paper, we show that directly prompting LLMs on these types of decision-making problems can yield poor results, especially as the problem complexity increases. To aid in these tasks, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step reasoning procedure that integrates recent best practices in scaling inference-time reasoning, drawing upon principles from decision theory and utility theory, to provide an accurate and human-auditable decision-making process. We validate our procedure on multiple realistic decision-making environments, demonstrating that DeLLMa can consistently enhance the decision-making performance of leading language models, and achieve up to a 40% increase in accuracy over competing methods. Additionally, we show how performance improves when scaling compute at test time, and carry out human evaluations to benchmark components of DeLLMa.
Paper Structure (66 sections, 3 equations, 25 figures, 5 tables, 2 algorithms)

This paper contains 66 sections, 3 equations, 25 figures, 5 tables, 2 algorithms.

Figures (25)

  • Figure 1: Given a decision problem and contextual information as a prompt, DeLLMa (decision-making LLM assistant) maximizes an expected utility to select an available action. We illustrate the key steps of DeLLMa on decision-making tasks in agriculture planning (top) and finance (bottom).
  • Figure 2: Results on the Agriculture environment. Left: DeLLMa variants outperform baseline methods for varying numbers of actions. Right: We see that DeLLMa yields a consistent improvement in decision-making accuracy across three families of leading LLMs.
  • Figure 2: Performance comparison across variations of our state forecasting procedure.
  • Figure 3: Left: Study on sample size and overlap percentage used by DeLLMa. Scaling the compute at test time produces better average accuracy. When ablating overlap percentage, we fix sample size at 16; when ablating sample size, we fix overlap percentage at 25%. Right: Illustration of the DeLLMa decision tree for the Agriculture dataset, showing two of the actions, and two of the sampled states per action. Each weight $w$ denotes the posterior probability $\pi(\theta_i^{(j)} \mid \mathcal{C})$.
  • Figure 4: Results on the Stocks environment. Left: We see that on average, DeLLMa-Top1 outperforms all baselines. Right: Illustration of the DeLLMa decision tree for the Stocks dataset, showing two of the actions and two of the sampled states per action.
  • ...and 20 more figures