Giving AI Personalities Leads to More Human-Like Reasoning
Animesh Nighojkar, Bekhzodbek Moydinboyev, My Duong, John Licato
TL;DR
This work tackles the problem of modeling the full spectrum of human reasoning, not just correct answers, by evaluating whether LLMs can reproduce both fast System 1 and slow System 2 processes. It introduces a six-way Natural Language Inference (NLI) framework to capture diverse reasoning patterns and collects distributional human responses via a two-phase, AI-generated item set across multiple reasoning domains. The study demonstrates that personality prompting, especially when weighted by a genetic algorithm, substantially improves LLMs' ability to mirror human response distributions, with open-source models like Llama and Mistral often outperforming proprietary GPT models. These findings suggest a practical path to more human-like AI reasoning by embracing diverse cognitive styles and psychological profiles, with implications for personalized and ethical AI systems; EMS (Earth Mover's Similarity) quantifies distributional alignment, reinforcing the value of modeling the full reasoning spectrum.
Abstract
In computational cognitive modeling, capturing the full spectrum of human judgment and decision-making processes, beyond just optimal behaviors, is a significant challenge. This study explores whether Large Language Models (LLMs) can emulate the breadth of human reasoning by predicting both intuitive, fast System 1 and deliberate, slow System 2 processes. We investigate the potential of AI to mimic diverse reasoning behaviors across a human population, addressing what we call the "full reasoning spectrum problem". We designed reasoning tasks using a novel generalization of the Natural Language Inference (NLI) format to evaluate LLMs' ability to replicate human reasoning. The questions were crafted to elicit both System 1 and System 2 responses. Human responses were collected through crowd-sourcing and the entire distribution was modeled, rather than just the majority of the answers. We used personality-based prompting inspired by the Big Five personality model to elicit AI responses reflecting specific personality traits, capturing the diversity of human reasoning, and exploring how personality traits influence LLM outputs. Combined with genetic algorithms to optimize the weighting of these prompts, this method was tested alongside traditional machine learning models. The results show that LLMs can mimic human response distributions, with open-source models like Llama and Mistral outperforming proprietary GPT models. Personality-based prompting, especially when optimized with genetic algorithms, significantly enhanced LLMs' ability to predict human response distributions, suggesting that capturing suboptimal, naturalistic reasoning may require modeling techniques incorporating diverse reasoning styles and psychological profiles. The study concludes that personality-based prompting combined with genetic algorithms is promising for enhancing AI's 'human-ness' in reasoning.
