How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in Italian
Andrea Pedrotti, Giulia Rambelli, Caterina Villani, Marianna Bolognesi
TL;DR
This paper investigates how humans and LLMs organize subordinate-level conceptual knowledge in Italian by introducing a new psycholinguistic dataset of 187 basic-level categories and three evaluation tasks: exemplar generation, category induction, and typicality prediction. It compares human-produced exemplars to outputs from multiple open-source language and vision-language models, revealing a general low alignment with humans but notable domain-dependent performance, with some models approximating human behavior in encyclopedic domains. The study demonstrates both the potential of AI to generate useful exemplars for large-scale semantic resources and the limitations due to hallucinations, polysemy, and the flat hierarchical organization of many LLMs. It also highlights methodological tools, such as exemplar availability metrics and perplexity-based induction and typicality tasks, and discusses implications for education, knowledge-base population, and category-aware language generation, while outlining avenues for cross-linguistic and multi-level investigations.
Abstract
People can categorize the same entity at multiple taxonomic levels, such as basic (bear), superordinate (animal), and subordinate (grizzly bear). While prior research has focused on basic-level categories, this study is the first attempt to examine the organization of categories by analyzing exemplars produced at the subordinate level. We present a new Italian psycholinguistic dataset of human-generated exemplars for 187 concrete words. We then use these data to evaluate whether textual and vision LLMs produce meaningful exemplars that align with human category organization across three key tasks: exemplar generation, category induction, and typicality judgment. Our findings show a low alignment between humans and LLMs, consistent with previous studies. However, their performance varies notably across different semantic domains. Ultimately, this study highlights both the promises and the constraints of using AI-generated exemplars to support psychological and linguistic research.
