The quasi-semantic competence of LLMs: a case study on the part-whole relation
Mattia Proietti, Alessandro Lenci
TL;DR
The paper probes how far large language models understand the part–whole (meronymy) relationship, focusing on antisymmetry as a core inferential property. It combines behavioral prompting, probabilistic sentence plausibility, and representational geometry to evaluate meronymy knowledge using ConceptNet and McRae norms across LlaMA2-7b, LlaMA2-7b-chat, and GPT-4. Across tasks, results show strong surface-level knowledge but limited abstract generalization, with partial linear encoding in embeddings and substantial gaps relative to human meaning. The findings argue for a quasi-semantic competence in current LLMs and highlight the need for grounding or substructure-based representations to achieve robust meronymic reasoning and generalization.
Abstract
Understanding the extent and depth of the semantic competence of \emph{Large Language Models} (LLMs) is at the center of the current scientific agenda in Artificial Intelligence (AI) and Computational Linguistics (CL). We contribute to this endeavor by investigating their knowledge of the \emph{part-whole} relation, a.k.a. \emph{meronymy}, which plays a crucial role in lexical organization, but it is significantly understudied. We used data from ConceptNet relations \citep{speer2016conceptnet} and human-generated semantic feature norms \citep{McRae:2005} to explore the abilities of LLMs to deal with \textit{part-whole} relations. We employed several methods based on three levels of analysis: i.) \textbf{behavioral} testing via prompting, where we directly queried the models on their knowledge of meronymy, ii.) sentence \textbf{probability} scoring, where we tested models' abilities to discriminate correct (real) and incorrect (asymmetric counterfactual) \textit{part-whole} relations, and iii.) \textbf{concept representation} analysis in vector space, where we proved the linear organization of the \textit{part-whole} concept in the embedding and unembedding spaces. These analyses present a complex picture that reveals that the LLMs' knowledge of this relation is only partial. They have just a ``\emph{quasi}-semantic'' competence and still fall short of capturing deep inferential properties.
