Table of Contents
Fetching ...

Advancing Cognitive Science with LLMs

Dirk U. Wulff, Rui Mata

TL;DR

The paper addresses fragmentation and conceptual ambiguity in cognitive science due to interdisciplinarity. It surveys how LLMs can support cross-disciplinary mapping, formalization, taxonomy consolidation, integrated architectures, and contextualized representations. It presents concrete demonstrations and frameworks, including semantic embeddings for research maps, LLM-assisted formal modeling, ontology learning, Centaur-style multitask modeling, and contextualized data approaches, while noting pitfalls. The authors argue for judicious use—LLMs as complementary tools to human expertise—to enhance integrative, cumulative progress, contingent on interpretability, openness, and controls.

Abstract

Cognitive science faces ongoing challenges in knowledge synthesis and conceptual clarity, in part due to its multifaceted and interdisciplinary nature. Recent advances in artificial intelligence, particularly the development of large language models (LLMs), offer tools that may help to address these issues. This review examines how LLMs can support areas where the field has historically struggled, including establishing cross-disciplinary connections, formalizing theories, developing clear measurement taxonomies, achieving generalizability through integrated modeling frameworks, and capturing contextual and individual variation. We outline the current capabilities and limitations of LLMs in these domains, including potential pitfalls. Taken together, we conclude that LLMs can serve as tools for a more integrative and cumulative cognitive science when used judiciously to complement, rather than replace, human expertise.

Advancing Cognitive Science with LLMs

TL;DR

The paper addresses fragmentation and conceptual ambiguity in cognitive science due to interdisciplinarity. It surveys how LLMs can support cross-disciplinary mapping, formalization, taxonomy consolidation, integrated architectures, and contextualized representations. It presents concrete demonstrations and frameworks, including semantic embeddings for research maps, LLM-assisted formal modeling, ontology learning, Centaur-style multitask modeling, and contextualized data approaches, while noting pitfalls. The authors argue for judicious use—LLMs as complementary tools to human expertise—to enhance integrative, cumulative progress, contingent on interpretability, openness, and controls.

Abstract

Cognitive science faces ongoing challenges in knowledge synthesis and conceptual clarity, in part due to its multifaceted and interdisciplinary nature. Recent advances in artificial intelligence, particularly the development of large language models (LLMs), offer tools that may help to address these issues. This review examines how LLMs can support areas where the field has historically struggled, including establishing cross-disciplinary connections, formalizing theories, developing clear measurement taxonomies, achieving generalizability through integrated modeling frameworks, and capturing contextual and individual variation. We outline the current capabilities and limitations of LLMs in these domains, including potential pitfalls. Taken together, we conclude that LLMs can serve as tools for a more integrative and cumulative cognitive science when used judiciously to complement, rather than replace, human expertise.

Paper Structure

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Leveraging large language models (LLMs) to address core challenges in the cognitive sciences. From left to right, the five columns correspond to research inputs or foci (Articles, Theories, Measures, Findings, Environments) and how these can be processed by LLMs to produce useful outputs; Research maps: LLMs embed and index research articles to produce semantic maps that synthesize topics and reveal cross-field connections. Formal models: LLMs assist in translating verbal theories into formal or executable models for clearer assumptions and testable predictions. Measurement taxonomies: Semantic embeddings from LLMs help to align measures with constructs, detect redundancy, and support principled relabeling. Integrated frameworks: LLMs architectures support generalizable prediction across tasks to provide accounts of empirical findings. Contextualized representations: LLMs capture ecological, cultural, situational, and individual variation from real-world contexts to improve context-sensitive representations. Together, these applications illustrate how LLMs can foster a more systematic and integrative science of mind.
  • Figure 2: A research map of the "theory of mind" literature. The main panel on the left visualizes the semantic landscape of 15,043 articles, where proximity indicates conceptual similarity. The map was generated by creating joint semantic embeddings for titles and abstracts with an LLM zhang2025qwen3 and projecting them into two dimensions using a dimensional reduction algorithm pacmap. For more details see thoma2025mapping. The resulting clusters were manually labeled based on frequent author keywords. The "Temporal development" panel on the right illustrate the field's temporal development by highlighting publications from different decades, while the "Keyword distribution" panel reveals how core concepts are shared or siloed across different research areas.
  • Figure 3: Embedding-based mapping and relabeling of psychological measures. The figure illustrates how embeddings can be used to place questionnaire items and construct labels in a shared semantic space, reveal conceptual overlap, and reduce redundancy. Left: Individual items and construct labels are encoded as high-dimensional vectors derived from LLMs. Center: These vectors are projected into a common embedding space in which proximity reflects semantic similarity; items and constructs that cluster together likely capture overlapping meaning. (The cube depicts a 3D schematic; in practice, embeddings have many more dimensions.) Right: Clustering within this space supports systematic relabeling or consolidation: Constructs with highly similar item profiles can be reassigned or eliminated, yielding a more parsimonious taxonomy of measures and associated constructs.
  • Figure 4: The figure illustrates the approach underlying Centaur binz_foundation_2025, a foundation model of human cognition, trained to predict behavior across diverse experimental tasks. Left: Task instructions, stimuli, and participants’ trial histories from different cognitive aspects (search, deliberation, reasoning, decision making) are first translated into text and then tokenized to serve as model input. The corresponding input embeddings pass through a transformer neural network architecture (embedding layer, multi-head attention, and feed-forward blocks) to produce context-sensitive hidden representations of the task state. The model then outputs a probability distribution over possible outputs, including those representing task actions (for example, which option a participant will choose next) using a softmax layer. This approach has been used to capture behavioral regularities across multiple tasks (e.g., digit span, two-armed bandit), and it has been shown to generalize to new task structures and domains, implying that it may represent a unified framework for predicting and interpreting human behavior.