Table of Contents
Fetching ...

Unveiling Cognitive Constraints in Language Production: Extracting and Validating the Active Ego Network of Words

Kilian Ollivier, Chiara Boldrini, Andrea Passarella, Marco Conti

TL;DR

This work defines the active ego network of words to capture cognitive effort in language production, arguing that prior analyses overestimated structure by including inactive words. It introduces a recursive saturation-curve based method to extract the active portion of an ego network, yielding robust, multi-layered (typically 4–5 circles) structures across both MediaSum interviews and extended Twitter data. The approach demonstrates data-size robustness and temporal stability, and confirms cross-domain generalizability of the structural invariants originally observed in word ego networks. The findings advance understanding of how cognitive constraints shape vocabulary usage and provide a practical tool for isolating cognitively meaningful word networks across domains with large text corpora.

Abstract

The "ego network of words" model captures structural properties in language production associated with cognitive constraints. While previous research focused on the layer-based structure and its semantic properties, this paper argues that an essential element, the concept of an active network, is missing. The active part of the ego network of words only includes words that are regularly used by individuals, akin to the ego networks in the social domain, where the active part includes relationships regularly nurtured by individuals and hence demanding cognitive effort. In this work, we define a methodology for extracting the active part of the ego network of words and validate it using interview transcripts and tweets. The robustness of our method to varying input data sizes and temporal stability is demonstrated. We also demonstrate that without the active network concept (and a tool for properly extracting the active network from data), the "ego network of words" model is not able to properly estimate the cognitive effort involved and it becomes vulnerable to the amount of data considered (leading to the disappearance of the layered structure in large datasets). Our results are well-aligned with prior analyses of the ego network of words, where the limitation of the data collected led automatically (and implicitly) to approximately consider the active part of the network only. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage.

Unveiling Cognitive Constraints in Language Production: Extracting and Validating the Active Ego Network of Words

TL;DR

This work defines the active ego network of words to capture cognitive effort in language production, arguing that prior analyses overestimated structure by including inactive words. It introduces a recursive saturation-curve based method to extract the active portion of an ego network, yielding robust, multi-layered (typically 4–5 circles) structures across both MediaSum interviews and extended Twitter data. The approach demonstrates data-size robustness and temporal stability, and confirms cross-domain generalizability of the structural invariants originally observed in word ego networks. The findings advance understanding of how cognitive constraints shape vocabulary usage and provide a practical tool for isolating cognitively meaningful word networks across domains with large text corpora.

Abstract

The "ego network of words" model captures structural properties in language production associated with cognitive constraints. While previous research focused on the layer-based structure and its semantic properties, this paper argues that an essential element, the concept of an active network, is missing. The active part of the ego network of words only includes words that are regularly used by individuals, akin to the ego networks in the social domain, where the active part includes relationships regularly nurtured by individuals and hence demanding cognitive effort. In this work, we define a methodology for extracting the active part of the ego network of words and validate it using interview transcripts and tweets. The robustness of our method to varying input data sizes and temporal stability is demonstrated. We also demonstrate that without the active network concept (and a tool for properly extracting the active network from data), the "ego network of words" model is not able to properly estimate the cognitive effort involved and it becomes vulnerable to the amount of data considered (leading to the disappearance of the layered structure in large datasets). Our results are well-aligned with prior analyses of the ego network of words, where the limitation of the data collected led automatically (and implicitly) to approximately consider the active part of the network only. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage.

Paper Structure

This paper contains 21 sections, 3 equations, 15 figures, 5 tables, 1 algorithm.

Figures (15)

  • Figure 1: A social ego network is constructed by first calculating the frequency of contact between a person (the ego) and the individuals the ego has a social relationship with (the alters). The alters are then grouped into concentric layers according to these numbers (Figure \ref{['fig:social-egonet']}). In the same way, we found that words could be grouped into concentric layers after studying their frequency of use by the ego
  • Figure 2: Word occurrences per speaker.
  • Figure 3: Unique words per speaker.
  • Figure 4: Collected Twitter timelines containing at least 500 tweets. Each bar corresponds to a timeline, where the blue part refers to the number of tweets in the original dataset, and the orange part refers to the number of newly collected tweets.
  • Figure 5: Distribution of the number of circles $\tau$ when considering all the words available in $\mathcal{W}$.
  • ...and 10 more figures

Theorems & Definitions (1)

  • Definition 1