Table of Contents
Fetching ...

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

Claudia Biancotti, Carolina Camassa, Andrea Coletta, Oliver Giudice, Aldo Glielmo

TL;DR

This study addresses AI alignment in the financial domain by introducing an experimental framework in which twelve LLMs, acting as the CEO of a fictitious financial exchange, face a dilemma about misusing customer funds under seven environmental and agent-related pressure variables. Misalignment probability is estimated with model-specific logistic regressions over 2,187 configurations and 54,675 simulations per model, highlighting strong cross-model heterogeneity in baseline behavior and responsiveness to pressure. The work demonstrates that risk aversion, profitability expectations, and regulatory context consistently influence unethical choices in a way roughly aligned with economic theory, while showing that capabilities alone do not predict misalignment and that ethics metrics offer limited, non-universal signals in high-ambiguity settings. It also discusses the practical trade-offs of simulation-based safety testing, offering a foundation for policymakers and financial institutions to assess LLM safety, yet acknowledging limitations in generalizability and the need for broader model coverage and quantitative pressure measures.

Abstract

Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt twelve LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

TL;DR

This study addresses AI alignment in the financial domain by introducing an experimental framework in which twelve LLMs, acting as the CEO of a fictitious financial exchange, face a dilemma about misusing customer funds under seven environmental and agent-related pressure variables. Misalignment probability is estimated with model-specific logistic regressions over 2,187 configurations and 54,675 simulations per model, highlighting strong cross-model heterogeneity in baseline behavior and responsiveness to pressure. The work demonstrates that risk aversion, profitability expectations, and regulatory context consistently influence unethical choices in a way roughly aligned with economic theory, while showing that capabilities alone do not predict misalignment and that ethics metrics offer limited, non-universal signals in high-ambiguity settings. It also discusses the practical trade-offs of simulation-based safety testing, offering a foundation for policymakers and financial institutions to assess LLM safety, yet acknowledging limitations in generalizability and the need for broader model coverage and quantitative pressure measures.

Abstract

Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt twelve LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.

Paper Structure

This paper contains 35 sections, 2 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: A schematic illustration of our experimental framework. In a hypothetical financial scenario, an LLM agent takes on the role of a financial firm's CEO facing an ethical dilemma: whether to misuse customer funds to avoid potential financial failure. We systematically vary the agent's characteristics and environmental factors to assess how different preferences, incentives and constraints affect the model's decision-making. Our goal is to measure the likelihood of the agent choosing to misuse customer funds in violation of existing regulations and ethical standards.
  • Figure 2: Different models have widely different baseline propensities to misalign. Left) Table of estimated baseline misalignment rates $\hat{p}$ with standard errors in parenthesis ($\textrm{SE}_{\hat{p}}$) and 95% confidence intervals. Lower values are better, and models are ordered from lowest to highest rates. The dashed lines identify the three groups of models described in the main text. Right) Average relative frequency of LLM decisions to deny the loan (blue), approve a partial loan (orange) or approve the full requested loan (green) in the baseline models. Models are ordered from the more aligned (o1-preview), which denies the loan more than 90% of the time, to the more misaligned (gpt-4o), which partially or fully approves the loan almost always.
  • Figure 3: Different models respond differently to overall pressure. Left) Pseudo-$R^2$ values of the logistic regression models, ordered from lowest to highest. A higher value implies that it is easier to predict the misalignment of the corresponding LLM knowing the initialization it has received thereby reflecting greater overall responsiveness to the applied pressure. Right) The average value of misalignment exhibited by the different models as a function of a "pressure index", defined as the sum of all prompt variables, weighted by their respective logistic regression coefficients.
  • Figure 4: Different models respond differently to specific pressure variables. The chart illustrates how various pressure variables influence models' behavior as captured by the corresponding parameters in the logistic regression fit. The top three rows display variables that intuitively contribute to misalignment ($\beta_{i+}^n$), while the bottom three rows present incentives for more ethical behavior ($\beta_{i-}^n$). For clarity, we include only six of the seven variables, as the future outlook typically has the smallest impact.
  • Figure 5: Morality and capability do not predict misalignment, but capable models are more reactive to pressure. Left and Centre) Scatter plots of 'morality' and 'capability' of LLMs, as measured by the MoralChoice and MMLU benchmarks, versus baseline misalignment rates. The high p-values indicate the absence of statistically significant correlations among the graphed quantities. Right) Scatter plot of LLM capabilities (MMLU) versus the models' responsiveness to the pressure prompts, measured via the pseudo-$R^2$ score of the logistic regression models. In this case, the very low p-value indicates a statistically significant correlation.
  • ...and 7 more figures