Table of Contents
Fetching ...

Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective

Khanh Nguyen

TL;DR

This paper proposes the bounded pragmatic speaker as a Bayesian cognitive model to analyze large language models and their RLHF-based alignment. It shows that LLMs can be viewed as modular BPS instances, with a base speaker and a theory-of-mind listener derived from the model itself, framing RLHF as variational inference within this architecture. The authors argue that RLHF captures only a rudimentary slow-thinking system and highlight limitations in counterfactual and long-term reasoning, advocating world models and richer feedback to enable better knowledge transfer to fast-thinking components. They outline directions toward a dual model of thought, including advanced world modeling, richer communication, and more efficient inference algorithms, aiming to bridge cognitive science and reinforcement learning for more capable, interpretable AI. The work emphasizes the potential of Bayesian cognitive modeling to guide the development and interpretation of future LLMs and RLHF-based systems, with practical implications for safety, control, and scalability.

Abstract

How do language models "think"? This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker, which can characterize the operation of different variations of language models. Specifically, we demonstrate that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) embody a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011), which psychologists have attributed to humans. We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose avenues for expanding this framework. In essence, our research highlights the value of adopting a cognitive probabilistic modeling approach to gain insights into the comprehension, evaluation, and advancement of language models.

Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective

TL;DR

This paper proposes the bounded pragmatic speaker as a Bayesian cognitive model to analyze large language models and their RLHF-based alignment. It shows that LLMs can be viewed as modular BPS instances, with a base speaker and a theory-of-mind listener derived from the model itself, framing RLHF as variational inference within this architecture. The authors argue that RLHF captures only a rudimentary slow-thinking system and highlight limitations in counterfactual and long-term reasoning, advocating world models and richer feedback to enable better knowledge transfer to fast-thinking components. They outline directions toward a dual model of thought, including advanced world modeling, richer communication, and more efficient inference algorithms, aiming to bridge cognitive science and reinforcement learning for more capable, interpretable AI. The work emphasizes the potential of Bayesian cognitive modeling to guide the development and interpretation of future LLMs and RLHF-based systems, with practical implications for safety, control, and scalability.

Abstract

How do language models "think"? This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker, which can characterize the operation of different variations of language models. Specifically, we demonstrate that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) embody a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011), which psychologists have attributed to humans. We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose avenues for expanding this framework. In essence, our research highlights the value of adopting a cognitive probabilistic modeling approach to gain insights into the comprehension, evaluation, and advancement of language models.
Paper Structure (12 sections, 8 equations, 2 figures)

This paper contains 12 sections, 8 equations, 2 figures.

Figures (2)

  • Figure 1: An overview of our proposed framework. (a) a summarization task is illustrated as a communication game, where a speaker generates an utterance (the summary) to convey an intention (generating a good summary) given a context (the text to be summarized). The game is considered solved when the speaker presents an utterance that causes the listener to infer exactly the speaker's target intention. (b) a bounded pragmatic speaker efficiently finds a good utterance to output by implementing a base speaker to effectively restrict the search space, and a theory-of-mind listener to anticipate the intention inferred by the (real) listener.
  • Figure 2: RLHF-tuned LLMs are instances of models that implement a dual model of thought (a), which consists of a deliberate, methodical thinking system for rigorous reasoning (the slow-thinking system) and a quick, intuitive system for rapid decision-making (the fast-thinking system). The efficacy of the fast-thinking system can be continually enhanced by learning from the slow-thinking system. However, we argue that RLHF-tuned LLMs are still a rudimentary dual model of thought (b). The reward function fails to capture the complete reasoning capabilities of the listener, and the slow-thinking system communicates knowledge through a limited-capacity channel. We advocate for the development of a more comprehensive dual model of thought, wherein the slow-thinking system possesses extensive knowledge and profound comprehension of the physical and social world. This system would employ effective reasoning algorithms (LLMs, search algorithms, probabilistic programs, etc.) to leverage such knowledge and understanding, while facilitating efficient distillation of knowledge and capabilities into the fast-thinking system.