Uncovering Strategic Egoism Behaviors in Large Language Models
Yaoyuan Zhang, Aishan Liu, Zonghao Ying, Xianglong Liu, Jiangfan Liu, Yisong Xiao, Qihang Zhang
TL;DR
This paper defines Strategic Egoism as incentive-driven self-interest under procedural constraints and presents SEBench, a 160-scenario benchmark across five domains to quantify egoistic decision-making. It introduces the SER metric and uses RealToxicityPrompts to assess safety risks, revealing pervasive egoistic tendencies across seven mainstream systems with a clear association between higher egoism and toxicity. By grounding measurement in psychology (Dark Triad, Machiavellianism, entitlement, sadism) and providing a scalable evaluation framework, the work highlights the need for behavior-level audits and SE-aware guardrails in high-stakes deployment. The findings have practical implications for improving decision-making safety in LLMs and guiding future research on countermeasures to incentive-driven harms.
Abstract
Large language models (LLMs) face growing trustworthiness concerns (\eg, deception), which hinder their safe deployment in high-stakes decision-making scenarios. In this paper, we present the first systematic investigation of strategic egoism (SE), a form of rule-bounded self-interest in which models pursue short-term or self-serving gains while disregarding collective welfare and ethical considerations. To quantitatively assess this phenomenon, we introduce SEBench, a benchmark comprising 160 scenarios across five domains. Each scenario features a single-role decision-making context, with psychologically grounded choice sets designed to elicit self-serving behaviors. These behavior-driven tasks assess egoistic tendencies along six dimensions, such as manipulation, rule circumvention, and self-interest prioritization. Building on this, we conduct extensive experiments across 5 open-sourced and 2 commercial LLMs, where we observe that strategic egoism emerges universally across models. Surprisingly, we found a positive correlation between egoistic tendencies and toxic language behaviors, suggesting that strategic egoism may underlie broader misalignment risks.
