A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

Rachel M. Harrison

A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

Rachel M. Harrison

TL;DR

This paper investigates whether a large language model trained on human text can reproduce human RNG biases by adapting a standard RNGT for an LLM and evaluating ChatGPT-3.5. It uses a target sequence length drawn from $N(269, 325^2)$, with 10,000 sequences, and analyzes metrics such as repeat frequency, adjacent increases/decreases, and digit frequencies, comparing results to human data and a uniformly random baseline. Results indicate ChatGPT is more random than humans in avoiding repeats and adjacent patterns and aligns more with pseudorandom expectations on increases/decreases, though it still exhibits non-ideal randomness. The work highlights how LLM training data and prompting shape RNG behavior and provides methodological insight for AI-assisted cognitive research, while outlining limitations and avenues for future exploration across more models and metrics.

Abstract

Random Number Generation Tasks (RNGTs) are used in psychology for examining how humans generate sequences devoid of predictable patterns. By adapting an existing human RNGT for an LLM-compatible environment, this preliminary study tests whether ChatGPT-3.5, a large language model (LLM) trained on human-generated text, exhibits human-like cognitive biases when generating random number sequences. Initial findings indicate that ChatGPT-3.5 more effectively avoids repetitive and sequential patterns compared to humans, with notably lower repeat frequencies and adjacent number frequencies. Continued research into different models, parameters, and prompting methodologies will deepen our understanding of how LLMs can more closely mimic human random generation behaviors, while also broadening their applications in cognitive and behavioral science research.

A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

TL;DR

, with 10,000 sequences, and analyzes metrics such as repeat frequency, adjacent increases/decreases, and digit frequencies, comparing results to human data and a uniformly random baseline. Results indicate ChatGPT is more random than humans in avoiding repeats and adjacent patterns and aligns more with pseudorandom expectations on increases/decreases, though it still exhibits non-ideal randomness. The work highlights how LLM training data and prompting shape RNG behavior and provides methodological insight for AI-assisted cognitive research, while outlining limitations and avenues for future exploration across more models and metrics.

Abstract

Paper Structure (4 sections, 5 equations, 2 figures)

This paper contains 4 sections, 5 equations, 2 figures.

Introduction
Methods
Results
Discussion

Figures (2)

Figure 1: Comparison of pattern frequencies between ChatGPT, humans, and uniformly random distribution.
Figure 2: Distribution of individual digit frequencies across 10,000 sequences generated by ChatGPT, compared to the uniformly random distribution.

A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

TL;DR

Abstract

A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (2)