On Linear Representations and Pretraining Data Frequency in Language Models

Jack Merullo; Noah A. Smith; Sarah Wiegreffe; Yanai Elazar

On Linear Representations and Pretraining Data Frequency in Language Models

Jack Merullo, Noah A. Smith, Sarah Wiegreffe, Yanai Elazar

TL;DR

The paper investigates how pretraining data frequency shapes the internal linear representations of factual relations in language models. It uses Linear Relational Embeddings (LREs) to approximate the model's relational computations and shows that average subject–object co-occurrence frequency strongly predicts the emergence of linear representations, often independent of when the frequency is encountered during training. A regression framework demonstrates that LRE features encode signals about training data frequencies beyond what log probabilities or few-shot accuracy capture, and these signals generalize across models, enabling rough estimation of term frequencies in unseen pretraining data. Additionally, the authors release a Batch Search tool to count exact co-occurrences in tokenized training batches, and show that higher co-occurrence frequencies align with improved recall and linearity, suggesting potential avenues to steer model behavior by manipulating training data frequencies.

Abstract

Pretraining data has a direct impact on the behaviors and quality of language models (LMs), but we only understand the most basic principles of this relationship. While most work focuses on pretraining data's effect on downstream task behavior, we investigate its relationship to LM representations. Previous work has discovered that, in language models, some concepts are encoded `linearly' in the representations, but what factors cause these representations to form? We study the connection between pretraining data frequency and models' linear representations of factual relations. We find evidence that the formation of linear representations is strongly connected to pretraining term frequencies; specifically for subject-relation-object fact triplets, both subject-object co-occurrence frequency and in-context learning accuracy for the relation are highly correlated with linear representations. This is the case across all phases of pretraining. In OLMo-7B and GPT-J, we discover that a linear representation consistently (but not exclusively) forms when the subjects and objects within a relation co-occur at least 1k and 2k times, respectively, regardless of when these occurrences happen during pretraining. Finally, we train a regression model on measurements of linear representation quality in fully-trained LMs that can predict how often a term was seen in pretraining. Our model achieves low error even on inputs from a different model with a different pretraining dataset, providing a new method for estimating properties of the otherwise-unknown training data of closed-data models. We conclude that the strength of linear representations in LMs contains signal about the models' pretraining corpora that may provide new avenues for controlling and improving model behavior: particularly, manipulating the models' training data to meet specific frequency thresholds.

On Linear Representations and Pretraining Data Frequency in Language Models

TL;DR

Abstract

On Linear Representations and Pretraining Data Frequency in Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)