Risk Profiling and Modulation for LLMs
Yikai Wang, Xiaocheng Li, Guanting Chen
TL;DR
This work develops a pipeline to quantify and modulate risk preferences in large language models (LLMs) by adapting tools from behavioral economics. It profiles risk using utility-based frameworks and Bayesian learning, showing that instruction-tuned LLMs tend to align with standard utility forms, while pre-trained and RLHF-aligned models exhibit more complex patterns; post-training methods, particularly direct preference optimization (DPO), offer the most reliable and robust risk modulation across tasks and domains. The study demonstrates that in-context prompting is ineffective for risk steering, whereas supervised fine-tuning and DPO can align models to specified risk preferences, including cross-domain transfer as evidenced by DOSPERT and lottery datasets. These findings advance behavioral alignment for LLMs and provide a principled foundation for designing risk-aware decision-support systems and post-training regimens.
Abstract
Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.
