Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee
TL;DR
The paper investigates how language models make risk-aware decisions to answer or defer in the face of uncertain consequences. It introduces an evaluation framework that varies human-defined risk structures $(r_{ ext{cor}}, r_{ ext{inc}}, r_{ ext{ref}})$ while keeping tasks fixed, measuring how well LM policies maximize expected reward. Across multiple datasets, models exhibit suboptimal behaviors, often over-answering in high-risk and over-deferring in low-risk scenarios, traced to difficulty in composing independent skills for decision making. A skill-decomposition approach implemented via prompt chaining—isolating downstream task solving, confidence estimation, and expected-value reasoning—consistently improves risk-aware decision policies, providing actionable guidance for deploying more reliable LM-based agents across diverse risk levels.
Abstract
Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals. During this automatic process, agents need to take a series of actions, some of which might lead to severe consequences if incorrect actions are taken. Therefore, such agents must sometimes defer-refusing to act when their confidence is insufficient-to avoid the potential cost of incorrect actions. Because the severity of consequences varies across applications, the tendency to defer should also vary: in low-risk settings agents should answer more freely, while in high-risk settings their decisions should be more conservative. We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures-rewards and penalties for correct answers, incorrect answers, and refusals $(r_{\mathrm{cor}},r_{\mathrm{inc}}, r_{\mathrm{ref}})$-while keeping tasks fixed. This design evaluates LMs' risk-aware decision policies by measuring their ability to maximize expected reward. Across multiple datasets and models, we identify flaws in their decision policies: LMs tend to over-answer in high-risk settings and over-defer in low-risk settings. After analyzing the potential cause of such flaws, we find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies. Our results highlight the current limitations of LMs in risk-conditioned decision making and provide practical guidance for deploying more reliable LM-based agents across applications of varying risk levels.
