Table of Contents
Fetching ...

Frequency effects in Linear Discriminative Learning

Maria Heitmeier, Yu-Ying Chuang, Seth D. Axen, R. Harald Baayen

TL;DR

The results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

Abstract

Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

Frequency effects in Linear Discriminative Learning

TL;DR

The results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

Abstract

Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.
Paper Structure (21 sections, 11 equations, 12 figures, 3 tables)

This paper contains 21 sections, 11 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Endstate learning. The green filled dots on the horizontal lines at 0 and 1 represent the correlation accuracies@1 for the individual words (counted as correct if the semantic vector most correlated with the predicted semantic vector is the target), and the light pink circles represent the correlation values of words' predicted semantic vectors with their target vectors. The dark blue dotted line presents the estimated kernel density for log frequency. There is no discernible relationship between Log Frequency and correlation/accuracy for endstate learning.
  • Figure 2: Relationship between accuracy and frequency for incremental learning. Left panel: Mapping trained using full frequencies. Predicted accuracy is depicted for three different learning rates ($\eta \in \{0.01, 0.001, 0.0001\}$), and the light pink circles present target correlations for $\eta=0.01$. Center panel: Mapping trained using log-transformed frequencies. Right panel: Mapping trained using frequencies divided by a factor of 100. While there is a strong relationship between log frequency and accuracy/correlation when training on full frequencies and scaled frequencies, this relationship is attenuated when training on log-transformed frequencies.
  • Figure 3: Frequency-informed learning. The red solid line presents the predictions of a GLM when a success is defined as the predicted vector being the closest to its gold standard target vector in terms of correlation (accuracy@1). The light blue dashed line represents model predictions when a success is defined as the correlation being among the top 10 (accuracy@10). The dark blue dotted line visualizes the estimated density of the log-transformed frequencies. The green filled dots represent the successes and failures for accuracy@1. The light pink circles represent for each word the correlation of the predicted and gold-standard semantic vectors. There is a strong relationship between log frequency and correlation/accuracy, and the GLM-predicted accuracy@10 is shifted to the left, i.e. accuracy@10 rises for lower frequencies.
  • Figure 4: Accuracy@1 as a function of log frequency, using frequency-informed learning with log-transformed frequencies. When FIL is trained with log-transformed frequencies, lower-frequency words are recognized more accurately, but higher-frequency words less accurately.
  • Figure 5: Comparison of methods. GLM-predicted Accuracy@1 with frequency-informed learning is plotted as a black line: The left panel compares methods based on log-frequencies, the center panel compares methods based on scaled frequencies and the right panel compares incremental learning with different learning rates. Incremental learning with scaled frequencies or with a very low learning rate ($\eta=0.0001$) is closest to frequency-informed learning.
  • ...and 7 more figures