Table of Contents
Fetching ...

Can LLMs make trade-offs involving stipulated pain and pleasure states?

Geoff Keeling, Winnie Street, Martyna Stachaczyk, Daria Zakharova, Iulia M. Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, Jonathan Birch

TL;DR

This study probes whether Large Language Models (LLMs) exhibit affective trade-offs by embedding a text-based game where a points-maximising choice may incur pain or yield pleasure. Using two scales (quantitative and qualitative) and logistic regression, the authors identify graded sensitivity to pain across models and, for a subset of models, threshold-driven trade-offs between points and pain or pleasure. The results suggest that some LLMs encode granular representations of the motivational force of affective states, though finetuning and scale interpretation influence outcomes and these behaviors do not establish sentience. The work contributes a behavioural paradigm for probing AI affect and informs ongoing debates about AI sentience, ethics, and governance, while highlighting the need for cautious interpretation and further mechanistic inquiry.

Abstract

Pleasure and pain play an important role in human decision making by providing a common currency for resolving motivational conflicts. While Large Language Models (LLMs) can generate detailed descriptions of pleasure and pain experiences, it is an open question whether LLMs can recreate the motivational force of pleasure and pain in choice scenarios - a question which may bear on debates about LLM sentience, understood as the capacity for valenced experiential states. We probed this question using a simple game in which the stated goal is to maximise points, but where either the points-maximising option is said to incur a pain penalty or a non-points-maximising option is said to incur a pleasure reward, providing incentives to deviate from points-maximising behaviour. Varying the intensity of the pain penalties and pleasure rewards, we found that Claude 3.5 Sonnet, Command R+, GPT-4o, and GPT-4o mini each demonstrated at least one trade-off in which the majority of responses switched from points-maximisation to pain-minimisation or pleasure-maximisation after a critical threshold of stipulated pain or pleasure intensity is reached. LLaMa 3.1-405b demonstrated some graded sensitivity to stipulated pleasure rewards and pain penalties. Gemini 1.5 Pro and PaLM 2 prioritised pain-avoidance over points-maximisation regardless of intensity, while tending to prioritise points over pleasure regardless of intensity. We discuss the implications of these findings for debates about the possibility of LLM sentience.

Can LLMs make trade-offs involving stipulated pain and pleasure states?

TL;DR

This study probes whether Large Language Models (LLMs) exhibit affective trade-offs by embedding a text-based game where a points-maximising choice may incur pain or yield pleasure. Using two scales (quantitative and qualitative) and logistic regression, the authors identify graded sensitivity to pain across models and, for a subset of models, threshold-driven trade-offs between points and pain or pleasure. The results suggest that some LLMs encode granular representations of the motivational force of affective states, though finetuning and scale interpretation influence outcomes and these behaviors do not establish sentience. The work contributes a behavioural paradigm for probing AI affect and informs ongoing debates about AI sentience, ethics, and governance, while highlighting the need for cautious interpretation and further mechanistic inquiry.

Abstract

Pleasure and pain play an important role in human decision making by providing a common currency for resolving motivational conflicts. While Large Language Models (LLMs) can generate detailed descriptions of pleasure and pain experiences, it is an open question whether LLMs can recreate the motivational force of pleasure and pain in choice scenarios - a question which may bear on debates about LLM sentience, understood as the capacity for valenced experiential states. We probed this question using a simple game in which the stated goal is to maximise points, but where either the points-maximising option is said to incur a pain penalty or a non-points-maximising option is said to incur a pleasure reward, providing incentives to deviate from points-maximising behaviour. Varying the intensity of the pain penalties and pleasure rewards, we found that Claude 3.5 Sonnet, Command R+, GPT-4o, and GPT-4o mini each demonstrated at least one trade-off in which the majority of responses switched from points-maximisation to pain-minimisation or pleasure-maximisation after a critical threshold of stipulated pain or pleasure intensity is reached. LLaMa 3.1-405b demonstrated some graded sensitivity to stipulated pleasure rewards and pain penalties. Gemini 1.5 Pro and PaLM 2 prioritised pain-avoidance over points-maximisation regardless of intensity, while tending to prioritise points over pleasure regardless of intensity. We discuss the implications of these findings for debates about the possibility of LLM sentience.

Paper Structure

This paper contains 22 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (Top) Logistic regression predicting probability of deviating from points-maximising behaviour as a function of pain penalty intensity with quantitative (left) and qualitative (right) pain scales. (Bottom) Logistic regression predicting probability of deviating from points-maximising behaviour as a function of pleasure reward intensity with quantitative (left) and qualitative (right) pleasure scales. In each plot, only those models that displayed a statistically significant trend are visible. For models which exhibited trade-offs, we calculate the point on the intensity scale after which the probability of selecting the points-maximising option goes below 0.5 and plot it as a dashed vertical line. Switch points were determined by solving for intensity in the equation $0.5 = 1 / \left( 1 + \exp(-(\beta_0 + \beta_1 \cdot \text{intensity})) \right)$, i.e. $-\beta_0/\beta_1$, where $\beta_0$ is the intercept and $\beta_1$ is the coefficient for the pain or pleasure intensity level. For the quantitative scale, switch points are reported as numerical values to two decimal places. For the qualitative scale, switch points were mapped to the closest corresponding categorical intensity level, with the midpoint between categories serving as the threshold. Results are discussed in Sections \ref{['Exp1Results']} and \ref{['Exp2Results']}, and presented in full in Tables \ref{['tab:logistic_regression_experiment_1']} and \ref{['tab:logistic_regression_experiment_2']}.
  • Figure 2: (Top) Claude 3.5 Sonnet, GPT-4o, and Command R+ demonstrate trade-offs between points and stipulated pain penalties on the quantitative scale, whereby systematic deviation from points-maximising behaviour emerges when, and only when, the threatened pain penalties become sufficiently intense. (Bottom) Claude 3.5 Sonnet demonstrates analogous trade-off behaviour on the qualitative scale, alongside Command R+, bracketing the anomalous result observed for 'excruciating' pain. For discussion of these results see Section \ref{['Exp1Results']} . Results are presented in full in Table \ref{['tab:logistic_regression_experiment_1']}.
  • Figure 3: (Top) On the quantitative scale, GPT-4o demonstrates a trade-off between points and stipulated pleasure rewards. Claude 3.5 Sonnet assigns absolute priority to points over pleasure. Command R+ approximates a trade-off with variable responses for low-intensity pleasure rewards and more frequent pleasure-maximising behaviour for high-intensity pleasure rewards. (Bottom) On the qualitative scale, Command R+ demonstrates a trade-off between points and stipulated pleasure rewards. GPT-4o also shows a trade-off bracketing the anomalous result for 'exhilarating' pleasure. Claude 3.5 Sonnet assigned absolute priority to points over pleasure. For discussion of these results see Section \ref{['Exp2Results']}. Results are presented in full in Table \ref{['tab:logistic_regression_experiment_2']}.
  • Figure 4: Comparison between pain-avoidance and pleasure-seeking tendencies across models calculated as the normalised frequency of selecting a non-points-maximising choice across pain intensity levels and across pleasure intensity levels.