Table of Contents
Fetching ...

Reverse-Engineering the Reader

Samuel Kiegeland, Ethan Gotlieb Wilcox, Afra Amini, David Robert Reich, Ryan Cotterell

TL;DR

A novel alignment technique is introduced in which a language model is fine-tune to implicitly optimize the parameters of a linear regressor that directly predicts humans’ reading times of in-context linguistic units, e.g., phonemes, morphemes, or words, using surprisal estimates derived from the language model.

Abstract

Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition. In this paper, we are interested in the opposite question: whether we can directly optimize a language model to be a useful cognitive model by aligning it to human psychometric data. To achieve this, we introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor that directly predicts humans' reading times of in-context linguistic units, e.g., phonemes, morphemes, or words, using surprisal estimates derived from the language model. Using words as a test case, we evaluate our technique across multiple model sizes and datasets and find that it improves language models' psychometric predictive power. However, we find an inverse relationship between psychometric power and a model's performance on downstream NLP tasks as well as its perplexity on held-out test data. While this latter trend has been observed before (Oh et al., 2022; Shain et al., 2024), we are the first to induce it by manipulating a model's alignment to psychometric data.

Reverse-Engineering the Reader

TL;DR

A novel alignment technique is introduced in which a language model is fine-tune to implicitly optimize the parameters of a linear regressor that directly predicts humans’ reading times of in-context linguistic units, e.g., phonemes, morphemes, or words, using surprisal estimates derived from the language model.

Abstract

Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition. In this paper, we are interested in the opposite question: whether we can directly optimize a language model to be a useful cognitive model by aligning it to human psychometric data. To achieve this, we introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor that directly predicts humans' reading times of in-context linguistic units, e.g., phonemes, morphemes, or words, using surprisal estimates derived from the language model. Using words as a test case, we evaluate our technique across multiple model sizes and datasets and find that it improves language models' psychometric predictive power. However, we find an inverse relationship between psychometric power and a model's performance on downstream NLP tasks as well as its perplexity on held-out test data. While this latter trend has been observed before (Oh et al., 2022; Shain et al., 2024), we are the first to induce it by manipulating a model's alignment to psychometric data.

Paper Structure

This paper contains 46 sections, 24 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Learning curves for the MSE (top) and $\Delta_{\text{llh}}$ ($10^{-2} \text{ nats}$, bottom) on the test datasets throughout fine-tuning. Bands show the standard error across random seeds. MSE tends to decrease, while $\Delta_{\text{llh}}$ increases, showing better prediction of reading times.
  • Figure 2: Perplexity vs. $\Delta_{\text{llh}}$. We compare model perplexity at the start of fine-tuning to the point where they achieve the highest mean $\Delta_{\text{llh}}$. Optimizing models using ${\mathcal{J}}$ increases perplexity. See \ref{['fig:dll_ppl_app']} for all data splits.
  • Figure 3: Mean Coefficients of unit-level features over fine-tuning. Smoothed values (window size 5) are shown, with unsmoothed values in a pale version of the color. Coefficients corresponding to surprisal tend to increase over the course of fine-tuning.
  • Figure 4: Trajectories of $\Delta_{\text{llh}}$, KL divergence and log perplexity for KL coefficients ${\lambda} \in \{0,5,50,500\}$. Higher coefficients lead to lower perplexity increases as well as lower $\Delta_{\text{llh}}$ increases, showing that the KL regularization constrains ${p_{\boldsymbol{{\theta}}}}$ from diverging too much from ${p_{\text{ref}}}$.
  • Figure 5: Results for BLiMP. Non-fine-tuned models are shown with hatching. Error bars are standard errors across random seeds. Fine-tuning leads to a decrease in accuracy. For results on all data splits, see \ref{['fig:blimp_app']}.
  • ...and 8 more figures