Table of Contents
Fetching ...

Towards Multimodal Metaphor Understanding: A Chinese Dataset and Model for Metaphor Mapping Identification

Dongyu Zhang, Shengcheng Yin, Jingwei Yu, Zhiyao Wu, Zhen Li, Chengpei Xu, Xiaoxia Wang, Feng Xia

TL;DR

The paper tackles the scarcity of non-English multimodal metaphor resources and the challenge of explicitly identifying source–target mappings in multimodal metaphors. It introduces CM3D, a 6,108-item Chinese text-image advertisement dataset with target and source domain annotations, and CPMMIM, a two-stage CoT prompting-based model guided by Bi-Level Optimization to extract these mappings. Experimental results show CPMMIM outperforms baselines and that prompting in both stages significantly improves performance, with detailed ablations and error analyses. This work advances metaphor understanding in NLP, provides a valuable resource for cross-linguistic multimodal research, and offers a practical framework for future exploration of metaphor mappings across languages and modalities.

Abstract

Metaphors play a crucial role in human communication, yet their comprehension remains a significant challenge for natural language processing (NLP) due to the cognitive complexity involved. According to Conceptual Metaphor Theory (CMT), metaphors map a target domain onto a source domain, and understanding this mapping is essential for grasping the nature of metaphors. While existing NLP research has focused on tasks like metaphor detection and sentiment analysis of metaphorical expressions, there has been limited attention to the intricate process of identifying the mappings between source and target domains. Moreover, non-English multimodal metaphor resources remain largely neglected in the literature, hindering a deeper understanding of the key elements involved in metaphor interpretation. To address this gap, we developed a Chinese multimodal metaphor advertisement dataset (namely CM3D) that includes annotations of specific target and source domains. This dataset aims to foster further research into metaphor comprehension, particularly in non-English languages. Furthermore, we propose a Chain-of-Thought (CoT) Prompting-based Metaphor Mapping Identification Model (CPMMIM), which simulates the human cognitive process for identifying these mappings. Drawing inspiration from CoT reasoning and Bi-Level Optimization (BLO), we treat the task as a hierarchical identification problem, enabling more accurate and interpretable metaphor mapping. Our experimental results demonstrate the effectiveness of CPMMIM, highlighting its potential for advancing metaphor comprehension in NLP. Our dataset and code are both publicly available to encourage further advancements in this field.

Towards Multimodal Metaphor Understanding: A Chinese Dataset and Model for Metaphor Mapping Identification

TL;DR

The paper tackles the scarcity of non-English multimodal metaphor resources and the challenge of explicitly identifying source–target mappings in multimodal metaphors. It introduces CM3D, a 6,108-item Chinese text-image advertisement dataset with target and source domain annotations, and CPMMIM, a two-stage CoT prompting-based model guided by Bi-Level Optimization to extract these mappings. Experimental results show CPMMIM outperforms baselines and that prompting in both stages significantly improves performance, with detailed ablations and error analyses. This work advances metaphor understanding in NLP, provides a valuable resource for cross-linguistic multimodal research, and offers a practical framework for future exploration of metaphor mappings across languages and modalities.

Abstract

Metaphors play a crucial role in human communication, yet their comprehension remains a significant challenge for natural language processing (NLP) due to the cognitive complexity involved. According to Conceptual Metaphor Theory (CMT), metaphors map a target domain onto a source domain, and understanding this mapping is essential for grasping the nature of metaphors. While existing NLP research has focused on tasks like metaphor detection and sentiment analysis of metaphorical expressions, there has been limited attention to the intricate process of identifying the mappings between source and target domains. Moreover, non-English multimodal metaphor resources remain largely neglected in the literature, hindering a deeper understanding of the key elements involved in metaphor interpretation. To address this gap, we developed a Chinese multimodal metaphor advertisement dataset (namely CM3D) that includes annotations of specific target and source domains. This dataset aims to foster further research into metaphor comprehension, particularly in non-English languages. Furthermore, we propose a Chain-of-Thought (CoT) Prompting-based Metaphor Mapping Identification Model (CPMMIM), which simulates the human cognitive process for identifying these mappings. Drawing inspiration from CoT reasoning and Bi-Level Optimization (BLO), we treat the task as a hierarchical identification problem, enabling more accurate and interpretable metaphor mapping. Our experimental results demonstrate the effectiveness of CPMMIM, highlighting its potential for advancing metaphor comprehension in NLP. Our dataset and code are both publicly available to encourage further advancements in this field.
Paper Structure (25 sections, 9 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 9 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Example of multimodal metaphor.
  • Figure 2: Annotation examples.
  • Figure 3: Dataset Construction.
  • Figure 4: UMAP Visualization of Targets (a) and Sources (b).
  • Figure 5: Bi-Level Optimization formulation for metaphor mapping identification.
  • ...and 4 more figures