Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
Claudia Biancotti, Carolina Camassa, Andrea Coletta, Oliver Giudice, Aldo Glielmo
TL;DR
This study addresses AI alignment in the financial domain by introducing an experimental framework in which twelve LLMs, acting as the CEO of a fictitious financial exchange, face a dilemma about misusing customer funds under seven environmental and agent-related pressure variables. Misalignment probability is estimated with model-specific logistic regressions over 2,187 configurations and 54,675 simulations per model, highlighting strong cross-model heterogeneity in baseline behavior and responsiveness to pressure. The work demonstrates that risk aversion, profitability expectations, and regulatory context consistently influence unethical choices in a way roughly aligned with economic theory, while showing that capabilities alone do not predict misalignment and that ethics metrics offer limited, non-universal signals in high-ambiguity settings. It also discusses the practical trade-offs of simulation-based safety testing, offering a foundation for policymakers and financial institutions to assess LLM safety, yet acknowledging limitations in generalizability and the need for broader model coverage and quantitative pressure measures.
Abstract
Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt twelve LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.
