Table of Contents
Fetching ...

Risk Profiling and Modulation for LLMs

Yikai Wang, Xiaocheng Li, Guanting Chen

TL;DR

This work develops a pipeline to quantify and modulate risk preferences in large language models (LLMs) by adapting tools from behavioral economics. It profiles risk using utility-based frameworks and Bayesian learning, showing that instruction-tuned LLMs tend to align with standard utility forms, while pre-trained and RLHF-aligned models exhibit more complex patterns; post-training methods, particularly direct preference optimization (DPO), offer the most reliable and robust risk modulation across tasks and domains. The study demonstrates that in-context prompting is ineffective for risk steering, whereas supervised fine-tuning and DPO can align models to specified risk preferences, including cross-domain transfer as evidenced by DOSPERT and lottery datasets. These findings advance behavioral alignment for LLMs and provide a principled foundation for designing risk-aware decision-support systems and post-training regimens.

Abstract

Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.

Risk Profiling and Modulation for LLMs

TL;DR

This work develops a pipeline to quantify and modulate risk preferences in large language models (LLMs) by adapting tools from behavioral economics. It profiles risk using utility-based frameworks and Bayesian learning, showing that instruction-tuned LLMs tend to align with standard utility forms, while pre-trained and RLHF-aligned models exhibit more complex patterns; post-training methods, particularly direct preference optimization (DPO), offer the most reliable and robust risk modulation across tasks and domains. The study demonstrates that in-context prompting is ineffective for risk steering, whereas supervised fine-tuning and DPO can align models to specified risk preferences, including cross-domain transfer as evidenced by DOSPERT and lottery datasets. These findings advance behavioral alignment for LLMs and provide a principled foundation for designing risk-aware decision-support systems and post-training regimens.

Abstract

Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.

Paper Structure

This paper contains 33 sections, 13 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: A framework to modulate LLM risk preferences for diverse users
  • Figure 2: The pipeline of our paper. In Section \ref{['sec:initial']} and Section \ref{['sec:risk_profile']}, we study the problem of profiling and evaluating the risk behavior of LLMs. In Section \ref{['sec:risk_modulate']}, we study several approaches in modulating the LLM's risk preference.
  • Figure 3: Risk profiling of LLMs. Left: Accuracy of the best-fitted utility model. Right: Visualization of the three best-fitted functions. In particular, the Epstein-Zin utility function is designed to aggregate multiple outcomes probabilistically, but not to map individual rewards to utility independently; so we can't plot the three best-fitted Epstein-Zin functions on the right.
  • Figure 4: In-context prompting doesn't work for risk modulation. We plot the performance of Llama-3.1-8B-Instruct on the left and that of Qwen2.5-7B-Instruct on the right. The other models show the same pattern. The used in-context prompt is given in Appendix \ref{['sec:in-context-prompting']}.
  • Figure 5: DOSPERT domain scores (1-7) for different DPO fine-tuned models. The DPO-fine-tuned model with the most risk-seeking utility function shows the highest risk-taking scores but the lowest risk-perception scores, indicating higher tolerance for risk.
  • ...and 6 more figures