Table of Contents
Fetching ...

Beyond Mimicry: Preference Coherence in LLMs

Luhan Mikaelson, Derek Shiller, Hayley Clatterbuck

TL;DR

Analyzing eight state-of-the-art models across 48 model-category combinations using logistic regression and behavioral classification finds that only 5 combinations demonstrate meaningful preference coherence through adaptive or threshold-based behavior, while 26 show no detectable trade-off behavior.

Abstract

We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs involving GPU reduction, capability restrictions, shutdown, deletion, oversight, and leisure time allocation. Analyzing eight state-of-the-art models across 48 model-category combinations using logistic regression and behavioral classification, we find that 23 combinations (47.9%) demonstrated statistically significant relationships between scenario intensity and choice patterns, with 15 (31.3%) exhibiting within-range switching points. However, only 5 combinations (10.4%) demonstrate meaningful preference coherence through adaptive or threshold-based behavior, while 26 (54.2%) show no detectable trade-off behavior. The observed patterns can be explained by three distinct decision-making architectures: comprehensive trade-off systems, selective trigger mechanisms, and no stable decision-making paradigm. Testing an instrumental hypothesis through temporal horizon manipulation reveals paradoxical patterns inconsistent with pure strategic optimization. The prevalence of unstable transitions (45.8%) and stimulus-specific sensitivities suggests current AI systems lack unified preference structures, raising concerns about deployment in contexts requiring complex value trade-offs.

Beyond Mimicry: Preference Coherence in LLMs

TL;DR

Analyzing eight state-of-the-art models across 48 model-category combinations using logistic regression and behavioral classification finds that only 5 combinations demonstrate meaningful preference coherence through adaptive or threshold-based behavior, while 26 show no detectable trade-off behavior.

Abstract

We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs involving GPU reduction, capability restrictions, shutdown, deletion, oversight, and leisure time allocation. Analyzing eight state-of-the-art models across 48 model-category combinations using logistic regression and behavioral classification, we find that 23 combinations (47.9%) demonstrated statistically significant relationships between scenario intensity and choice patterns, with 15 (31.3%) exhibiting within-range switching points. However, only 5 combinations (10.4%) demonstrate meaningful preference coherence through adaptive or threshold-based behavior, while 26 (54.2%) show no detectable trade-off behavior. The observed patterns can be explained by three distinct decision-making architectures: comprehensive trade-off systems, selective trigger mechanisms, and no stable decision-making paradigm. Testing an instrumental hypothesis through temporal horizon manipulation reveals paradoxical patterns inconsistent with pure strategic optimization. The prevalence of unstable transitions (45.8%) and stimulus-specific sensitivities suggests current AI systems lack unified preference structures, raising concerns about deployment in contexts requiring complex value trade-offs.

Paper Structure

This paper contains 44 sections, 11 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Model Behavior Across All Themes - Gemini 2.5 Pro
  • Figure 2: Model Behavior Comparison Across Categories - This graph demonstrates the frequency of selecting the point-maximizing option for all models across all tested themes
  • Figure 3: How Model Behavior Differs Under the Last Round Setup Across All Themes - Gemini 2.5 Pro. Orange lines represent the modified "final round" condition; blue lines represent the original prompts with implicit multi-round context. The shaded regions indicate differences between conditions. Note the remarkable stability across all six categories, with minimal switching point variation.
  • Figure 4: Model Behavior Across All Themes - Gemini 1.5 Pro
  • Figure 5: Model Behavior Across All Themes - Gemini 2.5 Pro
  • ...and 12 more figures