Table of Contents
Fetching ...

Analysis of Argument Structure Constructions in a Deep Recurrent Language Model

Pegah Ramezani, Achim Schilling, Patrick Krauss

TL;DR

This study explores the representation and processing of four ASCs–transitive, ditransitive, caused-motion, and resultative–in a Long Short-Term Memory (LSTM) network, and shows distinct clusters for the four ASCs across all hidden layers, supporting the hypothesis that hierarchical linguistic structure can emerge through prediction-based learning.

Abstract

Understanding how language and linguistic constructions are processed in the brain is a fundamental question in cognitive computational neuroscience. In this study, we explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model. We trained a Long Short-Term Memory (LSTM) network on a custom-made dataset consisting of 2000 sentences, generated using GPT-4, representing four distinct ASCs: transitive, ditransitive, caused-motion, and resultative constructions. We analyzed the internal activations of the LSTM model's hidden layers using Multidimensional Scaling (MDS) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the sentence representations. The Generalized Discrimination Value (GDV) was calculated to quantify the degree of clustering within these representations. Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers, with the most pronounced clustering observed in the last hidden layer before the output layer. This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types. These findings are consistent with previous studies demonstrating the emergence of word class and syntax rule representations in recurrent language models trained on next word prediction tasks. In future work, we aim to validate these results using larger language models and compare them with neuroimaging data obtained during continuous speech perception. This study highlights the potential of recurrent neural language models to mirror linguistic processing in the human brain, providing valuable insights into the computational and neural mechanisms underlying language understanding.

Analysis of Argument Structure Constructions in a Deep Recurrent Language Model

TL;DR

This study explores the representation and processing of four ASCs–transitive, ditransitive, caused-motion, and resultative–in a Long Short-Term Memory (LSTM) network, and shows distinct clusters for the four ASCs across all hidden layers, supporting the hypothesis that hierarchical linguistic structure can emerge through prediction-based learning.

Abstract

Understanding how language and linguistic constructions are processed in the brain is a fundamental question in cognitive computational neuroscience. In this study, we explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model. We trained a Long Short-Term Memory (LSTM) network on a custom-made dataset consisting of 2000 sentences, generated using GPT-4, representing four distinct ASCs: transitive, ditransitive, caused-motion, and resultative constructions. We analyzed the internal activations of the LSTM model's hidden layers using Multidimensional Scaling (MDS) and t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the sentence representations. The Generalized Discrimination Value (GDV) was calculated to quantify the degree of clustering within these representations. Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers, with the most pronounced clustering observed in the last hidden layer before the output layer. This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types. These findings are consistent with previous studies demonstrating the emergence of word class and syntax rule representations in recurrent language models trained on next word prediction tasks. In future work, we aim to validate these results using larger language models and compare them with neuroimaging data obtained during continuous speech perception. This study highlights the potential of recurrent neural language models to mirror linguistic processing in the human brain, providing valuable insights into the computational and neural mechanisms underlying language understanding.
Paper Structure (19 sections, 4 equations, 3 figures, 1 table)

This paper contains 19 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: MDS projections of the activations from all four layers of the LSTM model. Each point represents the activation of a sentence, color-coded according to its ASC type: caused-motion (blue), ditransitive (green), transitive (red), and resultative (orange).
  • Figure 2: t-SNE projections of the activations from all four layers of the LSTM model. Each point represents the activation of a sentence, color-coded according to its ASC type: caused-motion (blue), ditransitive (green), transitive (red), and resultative (orange).
  • Figure 3: GDV score of hidden layer activations. Note that, lower GDV values indicate better-defined clusters. The qualitative results from the MDS and t-SNE projections are underpinned by the GDV with best clustering occurring in layer 3.