Table of Contents
Fetching ...

Giving AI Personalities Leads to More Human-Like Reasoning

Animesh Nighojkar, Bekhzodbek Moydinboyev, My Duong, John Licato

TL;DR

This work tackles the problem of modeling the full spectrum of human reasoning, not just correct answers, by evaluating whether LLMs can reproduce both fast System 1 and slow System 2 processes. It introduces a six-way Natural Language Inference (NLI) framework to capture diverse reasoning patterns and collects distributional human responses via a two-phase, AI-generated item set across multiple reasoning domains. The study demonstrates that personality prompting, especially when weighted by a genetic algorithm, substantially improves LLMs' ability to mirror human response distributions, with open-source models like Llama and Mistral often outperforming proprietary GPT models. These findings suggest a practical path to more human-like AI reasoning by embracing diverse cognitive styles and psychological profiles, with implications for personalized and ethical AI systems; EMS (Earth Mover's Similarity) quantifies distributional alignment, reinforcing the value of modeling the full reasoning spectrum.

Abstract

In computational cognitive modeling, capturing the full spectrum of human judgment and decision-making processes, beyond just optimal behaviors, is a significant challenge. This study explores whether Large Language Models (LLMs) can emulate the breadth of human reasoning by predicting both intuitive, fast System 1 and deliberate, slow System 2 processes. We investigate the potential of AI to mimic diverse reasoning behaviors across a human population, addressing what we call the "full reasoning spectrum problem". We designed reasoning tasks using a novel generalization of the Natural Language Inference (NLI) format to evaluate LLMs' ability to replicate human reasoning. The questions were crafted to elicit both System 1 and System 2 responses. Human responses were collected through crowd-sourcing and the entire distribution was modeled, rather than just the majority of the answers. We used personality-based prompting inspired by the Big Five personality model to elicit AI responses reflecting specific personality traits, capturing the diversity of human reasoning, and exploring how personality traits influence LLM outputs. Combined with genetic algorithms to optimize the weighting of these prompts, this method was tested alongside traditional machine learning models. The results show that LLMs can mimic human response distributions, with open-source models like Llama and Mistral outperforming proprietary GPT models. Personality-based prompting, especially when optimized with genetic algorithms, significantly enhanced LLMs' ability to predict human response distributions, suggesting that capturing suboptimal, naturalistic reasoning may require modeling techniques incorporating diverse reasoning styles and psychological profiles. The study concludes that personality-based prompting combined with genetic algorithms is promising for enhancing AI's 'human-ness' in reasoning.

Giving AI Personalities Leads to More Human-Like Reasoning

TL;DR

This work tackles the problem of modeling the full spectrum of human reasoning, not just correct answers, by evaluating whether LLMs can reproduce both fast System 1 and slow System 2 processes. It introduces a six-way Natural Language Inference (NLI) framework to capture diverse reasoning patterns and collects distributional human responses via a two-phase, AI-generated item set across multiple reasoning domains. The study demonstrates that personality prompting, especially when weighted by a genetic algorithm, substantially improves LLMs' ability to mirror human response distributions, with open-source models like Llama and Mistral often outperforming proprietary GPT models. These findings suggest a practical path to more human-like AI reasoning by embracing diverse cognitive styles and psychological profiles, with implications for personalized and ethical AI systems; EMS (Earth Mover's Similarity) quantifies distributional alignment, reinforcing the value of modeling the full reasoning spectrum.

Abstract

In computational cognitive modeling, capturing the full spectrum of human judgment and decision-making processes, beyond just optimal behaviors, is a significant challenge. This study explores whether Large Language Models (LLMs) can emulate the breadth of human reasoning by predicting both intuitive, fast System 1 and deliberate, slow System 2 processes. We investigate the potential of AI to mimic diverse reasoning behaviors across a human population, addressing what we call the "full reasoning spectrum problem". We designed reasoning tasks using a novel generalization of the Natural Language Inference (NLI) format to evaluate LLMs' ability to replicate human reasoning. The questions were crafted to elicit both System 1 and System 2 responses. Human responses were collected through crowd-sourcing and the entire distribution was modeled, rather than just the majority of the answers. We used personality-based prompting inspired by the Big Five personality model to elicit AI responses reflecting specific personality traits, capturing the diversity of human reasoning, and exploring how personality traits influence LLM outputs. Combined with genetic algorithms to optimize the weighting of these prompts, this method was tested alongside traditional machine learning models. The results show that LLMs can mimic human response distributions, with open-source models like Llama and Mistral outperforming proprietary GPT models. Personality-based prompting, especially when optimized with genetic algorithms, significantly enhanced LLMs' ability to predict human response distributions, suggesting that capturing suboptimal, naturalistic reasoning may require modeling techniques incorporating diverse reasoning styles and psychological profiles. The study concludes that personality-based prompting combined with genetic algorithms is promising for enhancing AI's 'human-ness' in reasoning.

Paper Structure

This paper contains 23 sections, 2 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: A question to and response by ChatGPT (GPT-4o)---without using browsing capabilities---shows that the LLM is familiar with this common question from cognitive science already, and thus questions of this type have questionable validity when used to assess its reasoning.
  • Figure 2: Similarity between human reasoning (System 1 and System 2) with LLMs' reasoning. Base prompting just prompts the LLMs for the answer and personality prompting prompts the LLM with different personality prompts and finds the best weight for each prompt using a genetic algorithm with K-fold cross validation. More details in Section \ref{['sec:experiments']}.
  • Figure 3: Survey instructions
  • Figure 4: Survey screen 1: Premise
  • Figure 5: Survey screen 2: Distractor
  • ...and 7 more figures