Table of Contents
Fetching ...

Unveiling LLMs' Metaphorical Understanding: Exploring Conceptual Irrelevance, Context Leveraging and Syntactic Influence

Fengying Ye, Shanshan Wang, Lidia S. Chao, Derek F. Wong

TL;DR

This study probes LLMs' metaphor understanding through three angles: concept mapping in embedding space, the existence of a metaphor-literal repository within models, and sensitivity to syntactic structure. Using a spatial analysis with $d_p$ and $\\cos\\theta$, imagination overlap metrics, and syntactic disruption tests across Fig-QA and MUNCH datasets, the authors reveal 15–25% concept-irrelevant interpretations, partial but limited context utilization, and notable sensitivity to syntax irregularities. GPT-4o shows strongest reduction in concept-irrelevance while V3-671B offers stronger alignment in the conceptual plane, but overall results indicate inconsistent metaphor comprehension across models. The work highlights the need for robust methods that fuse contextual reasoning with syntactic awareness to achieve deeper, concept-level metaphor understanding in LLMs. $d_p$, $\\cos\\theta$, and $Ad$ emerge as complementary diagnostics for evaluating conceptual alignment in generated interpretations.

Abstract

Metaphor analysis is a complex linguistic phenomenon shaped by context and external factors. While Large Language Models (LLMs) demonstrate advanced capabilities in knowledge integration, contextual reasoning, and creative generation, their mechanisms for metaphor comprehension remain insufficiently explored. This study examines LLMs' metaphor-processing abilities from three perspectives: (1) Concept Mapping: using embedding space projections to evaluate how LLMs map concepts in target domains (e.g., misinterpreting "fall in love" as "drop down from love"); (2) Metaphor-Literal Repository: analyzing metaphorical words and their literal counterparts to identify inherent metaphorical knowledge; and (3) Syntactic Sensitivity: assessing how metaphorical syntactic structures influence LLMs' performance. Our findings reveal that LLMs generate 15\%-25\% conceptually irrelevant interpretations, depend on metaphorical indicators in training data rather than contextual cues, and are more sensitive to syntactic irregularities than to structural comprehension. These insights underline the limitations of LLMs in metaphor analysis and call for more robust computational approaches.

Unveiling LLMs' Metaphorical Understanding: Exploring Conceptual Irrelevance, Context Leveraging and Syntactic Influence

TL;DR

This study probes LLMs' metaphor understanding through three angles: concept mapping in embedding space, the existence of a metaphor-literal repository within models, and sensitivity to syntactic structure. Using a spatial analysis with and , imagination overlap metrics, and syntactic disruption tests across Fig-QA and MUNCH datasets, the authors reveal 15–25% concept-irrelevant interpretations, partial but limited context utilization, and notable sensitivity to syntax irregularities. GPT-4o shows strongest reduction in concept-irrelevance while V3-671B offers stronger alignment in the conceptual plane, but overall results indicate inconsistent metaphor comprehension across models. The work highlights the need for robust methods that fuse contextual reasoning with syntactic awareness to achieve deeper, concept-level metaphor understanding in LLMs. , , and emerge as complementary diagnostics for evaluating conceptual alignment in generated interpretations.

Abstract

Metaphor analysis is a complex linguistic phenomenon shaped by context and external factors. While Large Language Models (LLMs) demonstrate advanced capabilities in knowledge integration, contextual reasoning, and creative generation, their mechanisms for metaphor comprehension remain insufficiently explored. This study examines LLMs' metaphor-processing abilities from three perspectives: (1) Concept Mapping: using embedding space projections to evaluate how LLMs map concepts in target domains (e.g., misinterpreting "fall in love" as "drop down from love"); (2) Metaphor-Literal Repository: analyzing metaphorical words and their literal counterparts to identify inherent metaphorical knowledge; and (3) Syntactic Sensitivity: assessing how metaphorical syntactic structures influence LLMs' performance. Our findings reveal that LLMs generate 15\%-25\% conceptually irrelevant interpretations, depend on metaphorical indicators in training data rather than contextual cues, and are more sensitive to syntactic irregularities than to structural comprehension. These insights underline the limitations of LLMs in metaphor analysis and call for more robust computational approaches.

Paper Structure

This paper contains 26 sections, 2 equations, 20 figures, 8 tables.

Figures (20)

  • Figure 1: The overview of experiment framework. Spatial analysis addresses the limitations of multi-choice task and multiple metaphorical mapping problem. For "trigger word" error, metaphorical imagination investigate context leveraging and the existence of a metaphor-literal repository within LLMs. Syntactic shuffle identifies the influence of syntax in metaphor analysis.
  • Figure 2: Illustration of spatial analysis in three-dimensional space. $d_{pi}$ is the perpendicular distance from $M_i$ to the conceptual plane. $d_{oi}$ denotes the distance from $M_i$ to $R_i$.
  • Figure 3: Illustration of angle $\theta$ between the conceptual plane and the interpretation plane. The conceptual plane is defined by the representations of $R_1$, $R_2$ and $S$, while the interpretation plane is defined by $R_1$, $R_2$ and $M_i$.
  • Figure 4: Distributions of ($d_p$, $Ad$) and ($d_p$, $\cos\theta$) for V3-671B and Qwen-T. Fluctuations are attributed to the variance of non-metaphorical parts in the sentences.
  • Figure 5: Overlap ratio distributions of ML (novel metaphor) imagined by R1-671B and LM by V3-671B. The best values are in bold, the second are underlined.
  • ...and 15 more figures