Table of Contents
Fetching ...

Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction

James A. Michaelov, Catherine Arnett

TL;DR

The paper investigates how language models acquire grammatical knowledge, focusing on subject-verb agreement and attraction effects, and shows that aggregate accuracy can mask key intermediate learning dynamics. It introduces a psycholinguistic-inspired disaggregation method that analyzes performance across data subsets and training time using the PolyPythia model suite and log-probability metrics. The results reveal an $n$-gram-like progression: models begin with unigram frequency biases, then incorporate local context and attraction effects, and finally achieve broader generalization as dependencies lengthen, supporting the notion of 'hidden breakthroughs' in grammatical learning. This approach provides interpretable diagnostics for training dynamics and informs evaluation of grammatical generalization benchmarks beyond aggregate metrics.

Abstract

Language models generally produce grammatical text, but they are more likely to make errors in certain contexts. Drawing on paradigms from psycholinguistics, we carry out a fine-grained analysis of those errors in different syntactic contexts. We demonstrate that by disaggregating over the conditions of carefully constructed datasets and comparing model performance on each over the course of training, it is possible to better understand the intermediate stages of grammatical learning in language models. Specifically, we identify distinct phases of training where language model behavior aligns with specific heuristics such as word frequency and local context rather than generalized grammatical rules. We argue that taking this approach to analyzing language model behavior more generally can serve as a powerful tool for understanding the intermediate learning phases, overall training dynamics, and the specific generalizations learned by language models.

Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction

TL;DR

The paper investigates how language models acquire grammatical knowledge, focusing on subject-verb agreement and attraction effects, and shows that aggregate accuracy can mask key intermediate learning dynamics. It introduces a psycholinguistic-inspired disaggregation method that analyzes performance across data subsets and training time using the PolyPythia model suite and log-probability metrics. The results reveal an -gram-like progression: models begin with unigram frequency biases, then incorporate local context and attraction effects, and finally achieve broader generalization as dependencies lengthen, supporting the notion of 'hidden breakthroughs' in grammatical learning. This approach provides interpretable diagnostics for training dynamics and informs evaluation of grammatical generalization benchmarks beyond aggregate metrics.

Abstract

Language models generally produce grammatical text, but they are more likely to make errors in certain contexts. Drawing on paradigms from psycholinguistics, we carry out a fine-grained analysis of those errors in different syntactic contexts. We demonstrate that by disaggregating over the conditions of carefully constructed datasets and comparing model performance on each over the course of training, it is possible to better understand the intermediate stages of grammatical learning in language models. Specifically, we identify distinct phases of training where language model behavior aligns with specific heuristics such as word frequency and local context rather than generalized grammatical rules. We argue that taking this approach to analyzing language model behavior more generally can serve as a powerful tool for understanding the intermediate learning phases, overall training dynamics, and the specific generalizations learned by language models.

Paper Structure

This paper contains 12 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: PolyPythia model accuracy on subject-verb agreement stimuli with (A) the verb be, (B) all other single-token and (C) multi-token verbs. The black line represents the mean across all conditions (i.e., the aggregate score). Shading reflects 95% confidence intervals.
  • Figure 2: PolyPythia model accuracy on subject-verb agreement stimuli with (A) single-token and (B) multi-token verbs. The black line represents the mean across all conditions (i.e., the aggregate score). Shading reflects 95% confidence intervals.
  • Figure 3: PolyPythia model accuracy on subject-verb agreement stimuli for the verb be. The black line represents the mean across all conditions (i.e., the aggregate score). Shading reflects 95% confidence intervals.
  • Figure 4: PolyPythia model accuracy on subject-verb agreement stimuli for single-token verbs. The black line represents the mean across all conditions (i.e., the aggregate score). Shading reflects 95% confidence intervals.
  • Figure 5: PolyPythia model accuracy on subject-verb agreement stimuli for multi-token verbs. The black line represents the mean across all conditions (i.e., the aggregate score). Shading reflects 95% confidence intervals.
  • ...and 1 more figures