Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo

Jian-Qiao Zhu; Haijiang Yan; Thomas L. Griffiths

Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo

Jian-Qiao Zhu, Haijiang Yan, Thomas L. Griffiths

TL;DR

This work investigates whether adaptive sampling methods can extract human-like mental representations from large language models (LLMs) by treating the model as a participant in a behavioral inference task. By framing color representations as $p(x|c)$ over the HSL color space and applying Direct Prompting, Direct Sampling, MCMC, and Gibbs Sampling with GPT-4, the authors compare the recovered distributions to human data. Adaptive methods (MCMC and Gibbs) outperform static prompting, with MCMC achieving the closest alignment to human representations in both distribution and mode, suggesting that LLMs can participate in Bayesian-like inference when integrated into sampling procedures. The findings point to a general and efficient framework for eliciting and comparing internal representations in LLMs, with potential applications in Bayesian reasoning and AI interpretability beyond color domains.

Abstract

Simulating sampling algorithms with people has proven a useful method for efficiently probing and understanding their mental representations. We propose that the same methods can be used to study the representations of Large Language Models (LLMs). While one can always directly prompt either humans or LLMs to disclose their mental representations introspectively, we show that increased efficiency can be achieved by using LLMs as elements of a sampling algorithm. We explore the extent to which we recover human-like representations when LLMs are interrogated with Direct Sampling and Markov chain Monte Carlo (MCMC). We found a significant increase in efficiency and performance using adaptive sampling algorithms based on MCMC. We also highlight the potential of our method to yield a more general method of conducting Bayesian inference \textit{with} LLMs.

Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo

TL;DR

over the HSL color space and applying Direct Prompting, Direct Sampling, MCMC, and Gibbs Sampling with GPT-4, the authors compare the recovered distributions to human data. Adaptive methods (MCMC and Gibbs) outperform static prompting, with MCMC achieving the closest alignment to human representations in both distribution and mode, suggesting that LLMs can participate in Bayesian-like inference when integrated into sampling procedures. The findings point to a general and efficient framework for eliciting and comparing internal representations in LLMs, with potential applications in Bayesian reasoning and AI interpretability beyond color domains.

Abstract

Paper Structure (17 sections, 4 equations, 5 figures, 1 table)

This paper contains 17 sections, 4 equations, 5 figures, 1 table.

Introduction
Background
Probing Large Language Models
From Behaviors to Representations
Direct Prompting with GPT-4
Direct Sampling with GPT-4
Markov Chain Monte Carlo with GPT-4
Gibbs Sampling with GPT-4
Recovering Color Representations from GPT-4
Stimuli
Procedure
Results
Convergence Diagnostic for Markov Chains
Representational Alignment of Humans and GPT-4
Discussion
...and 2 more sections

Figures (5)

Figure 1: Illustrations of the four behavioral methods used to recover mental representations for GPT-4. (A) Direct Prompting with GPT-4: GPT-4 is directly prompted to generate a HSL color code corresponding to a specified object. (B) Direct Sampling with GPT-4: In this iterative process, a random HSL color code is sampled and presented to GPT-4, which then evaluates the extent to which this color matches the target object. (C) Markov chain Monte Carlo (MCMC) with GPT-4: Each iteration involves proposing a new color, derived from the previously selected color, and then deciding whether to accept this new color or retain the old one. (D) Gibbs Sampling with GPT-4: In each step, GPT-4 is tasked with deducing and filling in a missing dimension of the HSL color code to better match the target object. In all panels, HSL color codes are colorized to assist easier comparison.
Figure 2: The evolution of the mean color representation across successive iterations. Each row within a color patch represents a single chain. Human data were adapted from harrison_gibbs_2020.
Figure 3: Samples in the color space produced by humans and those generated by GPT-4 using the four behavioral methods (displayed as columns). The overlaid contours are estimates derived from kernel density using a Gaussian kernel with a bandwidth of 1.
Figure 4: Cumulative $\hat{R}$ of Gibbs Sampling with People plus aggregation (left), MCMC with GPT-4 (middle), and Gibbs Sampling with GPT-4 (right). Reaching the threshold of $1.1$ suggests convergence of the Markov chain.
Figure 5: Comparing representations in humans and GPT-4. (left) Hellinger distance between the color representations derived from GPT-4 and those from humans. (right) Euclidean distances between the modes of representations from GPT-4 and humans. In both measures, lower numerical values are indicative of a stronger correspondence. Shaded areas indicate $\pm$SEM.

Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo

TL;DR

Abstract

Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo

Authors

TL;DR

Abstract

Table of Contents

Figures (5)