Table of Contents
Fetching ...

Human-like Affective Cognition in Foundation Models

Kanishk Gandhi, Zoe Lynch, Jan-Philipp Fränken, Kayla Patterson, Sharon Wambu, Tobias Gerstenberg, Desmond C. Ong, Noah D. Goodman

TL;DR

The results show foundation models tend to agree with human intuitions, matching or exceeding interparticipant agreement, and in some conditions, models are ``superhuman''-- they better predict modal human judgements than the average human.

Abstract

Understanding emotions is fundamental to human interaction and experience. Humans easily infer emotions from situations or facial expressions, situations from emotions, and do a variety of other affective cognition. How adept is modern AI at these inferences? We introduce an evaluation framework for testing affective cognition in foundation models. Starting from psychological theory, we generate 1,280 diverse scenarios exploring relationships between appraisals, emotions, expressions, and outcomes. We evaluate the abilities of foundation models (GPT-4, Claude-3, Gemini-1.5-Pro) and humans (N = 567) across carefully selected conditions. Our results show foundation models tend to agree with human intuitions, matching or exceeding interparticipant agreement. In some conditions, models are ``superhuman'' -- they better predict modal human judgements than the average human. All models benefit from chain-of-thought reasoning. This suggests foundation models have acquired a human-like understanding of emotions and their influence on beliefs and behavior.

Human-like Affective Cognition in Foundation Models

TL;DR

The results show foundation models tend to agree with human intuitions, matching or exceeding interparticipant agreement, and in some conditions, models are ``superhuman''-- they better predict modal human judgements than the average human.

Abstract

Understanding emotions is fundamental to human interaction and experience. Humans easily infer emotions from situations or facial expressions, situations from emotions, and do a variety of other affective cognition. How adept is modern AI at these inferences? We introduce an evaluation framework for testing affective cognition in foundation models. Starting from psychological theory, we generate 1,280 diverse scenarios exploring relationships between appraisals, emotions, expressions, and outcomes. We evaluate the abilities of foundation models (GPT-4, Claude-3, Gemini-1.5-Pro) and humans (N = 567) across carefully selected conditions. Our results show foundation models tend to agree with human intuitions, matching or exceeding interparticipant agreement. In some conditions, models are ``superhuman'' -- they better predict modal human judgements than the average human. All models benefit from chain-of-thought reasoning. This suggests foundation models have acquired a human-like understanding of emotions and their influence on beliefs and behavior.
Paper Structure (8 sections, 13 figures, 1 table)

This paper contains 8 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Causal Template for generating affective scenarios and an Example Scenario. (left) The causal template used to generate stimuli for testing affective inferences. Experiments 1a and 1b use the left four text-only causal factors, while Experiments 2a and 2b use all five factors including the Expression factor (represented as an image). (right) An example scenario generated with our causal template for affective inferences. The color of the text indicates the causal variable associated with it.
  • Figure 2: Example stimuli used for our experiments. We can generate stories to ask questions about different affective inferences. Each factor in the causal model, such as appraisals, outcomes, emotions, or expressions, can be varied to elicit different responses. We define different Facial Action Units ekman1978facial for different emotions to generate expressions using Unreal Engine. Note that these stimuli are representative of Experiment 2a (top) and 2b (bottom); the corresponding stimuli for Experiments 1a and 1b are text-only, and so would not have the facial expression.
  • Figure 3: Comparison of Inter-participant and Pre-assigned Label Agreement Scores. Inter-participant agreement scores compared to the agreement scores between participant responses and labels assigned to stimuli prior to collecting human responses. Error bars represent 95% Confidence Intervals.
  • Figure 4: Agreement Analysis for Emotion and Outcome Inference. Inter-participant agreements and model-participant agreements for inferring the (a) emotions and (b) outcomes from the context in Experiments 1a and 1b. (c) and (d): The corresponding agreements for Experiments 2a and 2b, when models and participants were also presented with expressions. Error bars represent 95% Confidence Intervals.
  • Figure 5: Agreement Analysis for Inferring Appraisals from Context without Expressions. Interparticipant agreements and model participant agreements for inferring the appraisals from the context, for (a, b) Experiment 1a and (c, d) Experiment 1b. Error bars represent 95% Confidence Intervals.
  • ...and 8 more figures