Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model

Maria Heitmeier; Valeria Schmidt; Hendrik P. A. Lensch; R. Harald Baayen

Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model

Maria Heitmeier, Valeria Schmidt, Hendrik P. A. Lensch, R. Harald Baayen

TL;DR

The paper investigates whether replacing linear mappings in the Discriminative Lexicon Model with Deep Discriminative Learning (DDL) neural networks improves cognitive modelling of language. Across four languages and multiple tasks, DDL generally increases mapping accuracy, especially for English and Dutch, and a frequency-informed variant (FIDDL) yields the best lexical decision RT predictions, though DDL does not consistently outperform LDL for trial-to-trial or incremental learning. The findings show that deeper models can enhance representational precision but that gains in mapping accuracy do not always translate into behavioural predictive power; data distribution and training regime (e.g., frequency-informed learning) critically shape outcomes. The work highlights when and how deep learning can inform morphology and cue-discrimination analyses, suggesting practical guidance on when to deploy DDL versus LDL in cognitive linguistic modelling.

Abstract

Recently, deep learning models have increasingly been used in cognitive modelling of language. This study asks whether deep learning can help us to better understand the learning problem that needs to be solved by speakers, above and beyond linear methods. We utilise the Discriminative Lexicon Model introduced by Baayen and colleagues, which models comprehension and production with mappings between numeric form and meaning vectors. While so far, these mappings have been linear (Linear Discriminative Learning, LDL), in the present study we replace them with deep dense neural networks (Deep Discriminative Learning, DDL). We find that DDL affords more accurate mappings for large and diverse datasets from English and Dutch, but not necessarily for Estonian and Taiwan Mandarin. DDL outperforms LDL in particular for words with pseudo-morphological structure such as chol+er. Applied to average reaction times, we find that DDL is outperformed by frequency-informed linear mappings (FIL). However, DDL trained in a frequency-informed way ('frequency-informed' deep learning, FIDDL) substantially outperforms FIL. Finally, while linear mappings can very effectively be updated from trial-to-trial to model incremental lexical learning, deep mappings cannot do so as effectively. At present, both linear and deep mappings are informative for understanding language.

Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model

TL;DR

Abstract

Paper Structure (9 sections, 6 equations, 6 figures, 1 table)

This paper contains 9 sections, 6 equations, 6 figures, 1 table.

Introduction
Methods
Data
Models
Does DDL improve mapping accuracy?
Does DDL improve prediction of behavioural data?
Predicting average reaction times
Predicting trial-to-trial reaction times
Discussion and Conclusion

Figures (6)

Figure 1: DDL model architectures. The red boxes represent dense layers (matrix multiplication plus bias) with the output dimension printed inside, blue boxes represent non-linearities. Box widths reflect the input and output dimensions of the layers. Figure from heitmeier2024.
Figure 2: Correlation accuracy for DDL and LDL models for a British English, Dutch, Estonian and Taiwan Mandarin dataset. For DDL models, accuracies are averaged across 10 models, the error bar shows the standard deviation. DDL clearly outperforms LDL for Dutch and English, and less clearly for Estonian and Mandarin. Test@10 indicates the accuracy@10 on the test data for the best model.
Figure 3: Measures of between-word similarity predicting the difference between target correlation in DDL and LDL (target correlation DDL - target correlation LDL). For higher values of correlation difference, DDL outperforms LDL. The lower x-axes show the centered and scaled values as entered into the GAMs; the upper x-axes show the original values for reference. We find that in general, DDL outperforms LDL for words that are similar to other words (i.e. words with higher cue overlap, a dense orthographic neighbourhood, shorter words and more frequent words). Figure adapted from heitmeier2024.
Figure 4: Target correlation taken from an EL, FIL, DDL, EDDL and FIDDL model predicting reaction times in the BLP and DLP. Figure adapted from heitmeier2024.
Figure 5: Comparison between static and dynamic simulations using DDL and LDL for predicting per-participant reaction times in the BLP. For words, the effect of dynamic simulations compared to static ones is larger for LDL, resulting in a bigger difference in AIC. Figure adapted from heitmeier2024.
...and 1 more figures

Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model

TL;DR

Abstract

Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model

Authors

TL;DR

Abstract

Table of Contents

Figures (6)