Table of Contents
Fetching ...

How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition

Yao Yao, Yifei Yang, Xinbei Ma, Dongjie Yang, Zhuosheng Zhang, Zuchao Li, Hai Zhao

TL;DR

The paper investigates semantic size as a window into human-like cognition in large language models (LLMs) across three strands: external metaphor-based understanding, internal representation probing, and real-world attention-bias in a web-shopping scenario. It systematically builds datasets using Glasgow Norms, trains and evaluates both humans and multiple multimodal LLMs, and applies linear probes to hidden representations. Key findings show that multimodal training yields sharper alignment with human semantic-size reasoning, improves internal encoding of size, and reveals biases toward semantically large, attention-grabbing content, with implications for AI safety and cognitive science. Overall, the work argues that grounding through multiple modalities is crucial for approaching human-like cognition in LLMs and offers insights into how embodied experiences shape conceptual understanding.

Abstract

How human cognitive abilities are formed has long captivated researchers. However, a significant challenge lies in developing meaningful methods to measure these complex processes. With the advent of large language models (LLMs), which now rival human capabilities in various domains, we are presented with a unique testbed to investigate human cognition through a new lens. Among the many facets of cognition, one particularly crucial aspect is the concept of semantic size, the perceived magnitude of both abstract and concrete words or concepts. This study seeks to investigate whether LLMs exhibit similar tendencies in understanding semantic size, thereby providing insights into the underlying mechanisms of human cognition. We begin by exploring metaphorical reasoning, comparing how LLMs and humans associate abstract words with concrete objects of varying sizes. Next, we examine LLMs' internal representations to evaluate their alignment with human cognitive processes. Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding, suggesting that real-world, multi-modal experiences are similarly vital for human cognitive development. Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario. The results show that multi-modal LLMs are more emotionally engaged in decision-making, but this also introduces potential biases, such as the risk of manipulation through clickbait headlines. Ultimately, this study offers a novel perspective on how LLMs interpret and internalize language, from the smallest concrete objects to the most profound abstract concepts like love. The insights gained not only improve our understanding of LLMs but also provide new avenues for exploring the cognitive abilities that define human intelligence.

How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition

TL;DR

The paper investigates semantic size as a window into human-like cognition in large language models (LLMs) across three strands: external metaphor-based understanding, internal representation probing, and real-world attention-bias in a web-shopping scenario. It systematically builds datasets using Glasgow Norms, trains and evaluates both humans and multiple multimodal LLMs, and applies linear probes to hidden representations. Key findings show that multimodal training yields sharper alignment with human semantic-size reasoning, improves internal encoding of size, and reveals biases toward semantically large, attention-grabbing content, with implications for AI safety and cognitive science. Overall, the work argues that grounding through multiple modalities is crucial for approaching human-like cognition in LLMs and offers insights into how embodied experiences shape conceptual understanding.

Abstract

How human cognitive abilities are formed has long captivated researchers. However, a significant challenge lies in developing meaningful methods to measure these complex processes. With the advent of large language models (LLMs), which now rival human capabilities in various domains, we are presented with a unique testbed to investigate human cognition through a new lens. Among the many facets of cognition, one particularly crucial aspect is the concept of semantic size, the perceived magnitude of both abstract and concrete words or concepts. This study seeks to investigate whether LLMs exhibit similar tendencies in understanding semantic size, thereby providing insights into the underlying mechanisms of human cognition. We begin by exploring metaphorical reasoning, comparing how LLMs and humans associate abstract words with concrete objects of varying sizes. Next, we examine LLMs' internal representations to evaluate their alignment with human cognitive processes. Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding, suggesting that real-world, multi-modal experiences are similarly vital for human cognitive development. Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario. The results show that multi-modal LLMs are more emotionally engaged in decision-making, but this also introduces potential biases, such as the risk of manipulation through clickbait headlines. Ultimately, this study offers a novel perspective on how LLMs interpret and internalize language, from the smallest concrete objects to the most profound abstract concepts like love. The insights gained not only improve our understanding of LLMs but also provide new avenues for exploring the cognitive abilities that define human intelligence.

Paper Structure

This paper contains 25 sections, 4 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: The image displays the semantic size of various words, derived from the Glasgow Norms dataset scott2019glasgow . The area of each block represents the relative semantic size of the corresponding word, with larger blocks indicating greater semantic size.
  • Figure 2: Overview of how LLMs perceive semantic size from three perspectives. External exploration examines metaphorical associations between abstract concepts and concrete objects, comparing LLMs and human cognition. Internal exploration probes how LLMs encode and represent semantic size. Real-world exploration assesses how semantic size influences LLMs’ responses to attention-grabbing headlines in a web shopping scenario, revealing potential biases in decision-making.
  • Figure 3: Examples for Semantic Size Metaphor dataset. Both abstract words pairs and concrete words triplets are matched across conditions for word frequency and length. SIZE-VARYING triplets: the semantic size labels of the three concrete words are big, medium, and small, respectively; SIZE-MATCH triplets: all three concrete words share similar semantic size.
  • Figure 4: Results for Semantic Size Metaphor Study with size-varying setting using the extended dataset. The vertical axis represents different models, with color distinguishing the type: human (orange), text-only LLM (blue), and multi-modal LLM (MLLM, green). The horizontal axis shows the probability of selecting the corresponding large or small concrete object for a given abstract word. Each bubble’s color represents models from the same family, while the bubble size indicates the model's certainty in making a selection, with larger bubbles representing greater certainty. An LLM that performs similarly to humans should have a higher probability and when presented with big or small abstract words, reflecting accuracy and confidence in selecting size-congruent concrete objects. As shown in the figure, MLLMs tend to have larger and further-right bubbles compared to text-only LLMs, indicating improved performance after multi-modal training.
  • Figure 5: Semantic size probing accuracy (average of abstract and concrete words) based on attention head activations across all layers. The multi-modal LLMs (bottom row) show deeper shading, indicating higher accuracy compared to the text-only LLMs (top row). Notably, the Mistral model's accuracy remains low, even though multi-modal training improves performance slightly. This aligns with the findings from the previous study 1, where Mistral exhibited lower overall effectiveness in semantic size metaphor.
  • ...and 3 more figures