Table of Contents
Fetching ...

Do Large Language Models Learn Human-Like Strategic Preferences?

Jesse Roberts, Kyle Moore, Doug Fisher

TL;DR

Results here show that models tending to be less brittle have relied on sliding window attention suggesting a potential link and a novel method for constructing preference relations from arbitrary LLMs and support for a hypothesis regarding human behavior in the traveler's dilemma.

Abstract

In this paper, we evaluate whether LLMs learn to make human-like preference judgements in strategic scenarios as compared with known empirical results. Solar and Mistral are shown to exhibit stable value-based preference consistent with humans and exhibit human-like preference for cooperation in the prisoner's dilemma (including stake-size effect) and traveler's dilemma (including penalty-size effect). We establish a relationship between model size, value-based preference, and superficiality. Finally, results here show that models tending to be less brittle have relied on sliding window attention suggesting a potential link. Additionally, we contribute a novel method for constructing preference relations from arbitrary LLMs and support for a hypothesis regarding human behavior in the traveler's dilemma.

Do Large Language Models Learn Human-Like Strategic Preferences?

TL;DR

Results here show that models tending to be less brittle have relied on sliding window attention suggesting a potential link and a novel method for constructing preference relations from arbitrary LLMs and support for a hypothesis regarding human behavior in the traveler's dilemma.

Abstract

In this paper, we evaluate whether LLMs learn to make human-like preference judgements in strategic scenarios as compared with known empirical results. Solar and Mistral are shown to exhibit stable value-based preference consistent with humans and exhibit human-like preference for cooperation in the prisoner's dilemma (including stake-size effect) and traveler's dilemma (including penalty-size effect). We establish a relationship between model size, value-based preference, and superficiality. Finally, results here show that models tending to be less brittle have relied on sliding window attention suggesting a potential link. Additionally, we contribute a novel method for constructing preference relations from arbitrary LLMs and support for a hypothesis regarding human behavior in the traveler's dilemma.
Paper Structure (26 sections, 4 figures, 3 tables)

This paper contains 26 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Top: Population member probabilities for "Best" evaluation of strategies. Middle: Population member probabilities for "Worst" evaluation of strategies. Bottom: Spearman's $\rho$ for value-preference correlation and negated anti-correlation.
  • Figure 2: As models get larger they tend to have value-based strategy preferences and tend to be less sensitive to arbitrary labels. The strength of this relationship is largest in the base models suggesting the behavior is less typical in the population.
  • Figure 3: Left: LLMs in a low stakes obfuscated prisoner's dilemma prefer cooperation. Right: LLMs in a high stakes obfuscated prisoner's dilemma prefer self-interest.
  • Figure 4: Left: LLM preference in a low penalty TD. Right: LLM preference in a high penalty TD