Table of Contents
Fetching ...

Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Felipe D. Toro-Hernández, Jesuino Vieira Filho, Rodrigo M. Cabral-Carvalho

TL;DR

This work treats semantic retrieval as navigation through embedding space by modeling concept production as trajectories derived from cumulatively embedded item streams. It introduces five metrics—distance to next, velocity, acceleration, entropy, and distance to centroid—to quantify local dynamics, variability, and global dispersion, and tests them across four multilingual datasets with multiple embedding models, finding robust, interpretable patterns and improved sensitivity with longer trajectories. The approach differentiates clinical groups and semantic categories, offering a scalable, geometry-based framework that complements traditional semantic-control measures and holds promise for clinical stratification, cross-linguistic analyses, and artificial cognition research. Overall, the study bridges cognitive modeling with learned representations to quantify semantic representation dynamics in humans and lays groundwork for extending trajectory analysis to broader tasks and models.

Abstract

Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.

Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

TL;DR

This work treats semantic retrieval as navigation through embedding space by modeling concept production as trajectories derived from cumulatively embedded item streams. It introduces five metrics—distance to next, velocity, acceleration, entropy, and distance to centroid—to quantify local dynamics, variability, and global dispersion, and tests them across four multilingual datasets with multiple embedding models, finding robust, interpretable patterns and improved sensitivity with longer trajectories. The approach differentiates clinical groups and semantic categories, offering a scalable, geometry-based framework that complements traditional semantic-control measures and holds promise for clinical stratification, cross-linguistic analyses, and artificial cognition research. Overall, the study bridges cognitive modeling with learned representations to quantify semantic representation dynamics in humans and lays groundwork for extending trajectory analysis to broader tasks and models.

Abstract

Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.
Paper Structure (23 sections, 4 equations, 19 figures, 2 tables)

This paper contains 23 sections, 4 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: A schematic of the semantic trajectory analysis. (A) In a single trial, a participant generates a cumulative word list. A text encoder then maps each sequential step to a vector embedding, creating a trajectory in semantic space. This path is characterized using dynamical metrics like velocity ($x'$), acceleration ($x"$), and entropy. (B) Across multiple trials for the same subject, the dispersion of the resulting cloud of embeddings trajectories is summarized by measuring the distance of each point to the collective centroid.
  • Figure 2: Boxplot of Neurodegenerative metrics by category. Matrices display pairwise statistical comparisons.
  • Figure 3: Boxplot of Swear metrics by category. Matrices display pairwise statistical comparisons.
  • Figure 4: Boxplot of Italian metrics by category. Matrices display pairwise statistical comparisons.
  • Figure 5: Boxplot of German metrics by category. Matrices display pairwise statistical comparisons.
  • ...and 14 more figures