Is deeper always better? Replacing linear mappings with deep learning networks in the Discriminative Lexicon Model
Maria Heitmeier, Valeria Schmidt, Hendrik P. A. Lensch, R. Harald Baayen
TL;DR
The paper investigates whether replacing linear mappings in the Discriminative Lexicon Model with Deep Discriminative Learning (DDL) neural networks improves cognitive modelling of language. Across four languages and multiple tasks, DDL generally increases mapping accuracy, especially for English and Dutch, and a frequency-informed variant (FIDDL) yields the best lexical decision RT predictions, though DDL does not consistently outperform LDL for trial-to-trial or incremental learning. The findings show that deeper models can enhance representational precision but that gains in mapping accuracy do not always translate into behavioural predictive power; data distribution and training regime (e.g., frequency-informed learning) critically shape outcomes. The work highlights when and how deep learning can inform morphology and cue-discrimination analyses, suggesting practical guidance on when to deploy DDL versus LDL in cognitive linguistic modelling.
Abstract
Recently, deep learning models have increasingly been used in cognitive modelling of language. This study asks whether deep learning can help us to better understand the learning problem that needs to be solved by speakers, above and beyond linear methods. We utilise the Discriminative Lexicon Model introduced by Baayen and colleagues, which models comprehension and production with mappings between numeric form and meaning vectors. While so far, these mappings have been linear (Linear Discriminative Learning, LDL), in the present study we replace them with deep dense neural networks (Deep Discriminative Learning, DDL). We find that DDL affords more accurate mappings for large and diverse datasets from English and Dutch, but not necessarily for Estonian and Taiwan Mandarin. DDL outperforms LDL in particular for words with pseudo-morphological structure such as chol+er. Applied to average reaction times, we find that DDL is outperformed by frequency-informed linear mappings (FIL). However, DDL trained in a frequency-informed way ('frequency-informed' deep learning, FIDDL) substantially outperforms FIL. Finally, while linear mappings can very effectively be updated from trial-to-trial to model incremental lexical learning, deep mappings cannot do so as effectively. At present, both linear and deep mappings are informative for understanding language.
