Table of Contents
Fetching ...

CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu

TL;DR

This work introduces CMDAG, a large-scale Chinese metaphor corpus annotated with GROUNDS (an Adjective+Noun grounding) to support metaphor generation. It provides rigorous annotation guidelines for TENOR, VEHICLE, and GROUNDS, and proposes using GROUNDS as a Chain-of-Thought input to improve metaphor generation. The authors evaluate GROUNDS-based CoT across multiple open-source LLMs, showing improved generation quality and offering insights into model behavior and evaluation criteria. Overall, CMDAG offers a valuable resource and methodological framework for grounding-aware Chinese metaphor research and generation.

Abstract

Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and consistency of our annotations, we introduce a comprehensive set of guidelines. These guidelines address the facets of metaphor annotation, including identifying tenors, vehicles, and grounds to handling the complexities of similes, personifications, juxtapositions, and hyperboles. Breaking tradition, our approach to metaphor generation emphasizes grounds and their distinct features rather than the conventional combination of tenors and vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are able to generate metaphors that resonate more with real-world intuition. We test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using our annotated corpus. These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research. The code is available in https://github.com/JasonShao55/Chinese_Metaphor_Explanation.

CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

TL;DR

This work introduces CMDAG, a large-scale Chinese metaphor corpus annotated with GROUNDS (an Adjective+Noun grounding) to support metaphor generation. It provides rigorous annotation guidelines for TENOR, VEHICLE, and GROUNDS, and proposes using GROUNDS as a Chain-of-Thought input to improve metaphor generation. The authors evaluate GROUNDS-based CoT across multiple open-source LLMs, showing improved generation quality and offering insights into model behavior and evaluation criteria. Overall, CMDAG offers a valuable resource and methodological framework for grounding-aware Chinese metaphor research and generation.

Abstract

Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and consistency of our annotations, we introduce a comprehensive set of guidelines. These guidelines address the facets of metaphor annotation, including identifying tenors, vehicles, and grounds to handling the complexities of similes, personifications, juxtapositions, and hyperboles. Breaking tradition, our approach to metaphor generation emphasizes grounds and their distinct features rather than the conventional combination of tenors and vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are able to generate metaphors that resonate more with real-world intuition. We test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using our annotated corpus. These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research. The code is available in https://github.com/JasonShao55/Chinese_Metaphor_Explanation.
Paper Structure (19 sections, 3 figures, 8 tables)

This paper contains 19 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Sketch Map of the Metaphorical Language Writing Process.
  • Figure 2: Word Clouds of tenors, vehicles, and adjective and noun components of grounds; the corresponding English word clouds are in the lower row.
  • Figure 3: A flowchart that illustrates our experiment with an example of task 1.