Table of Contents
Fetching ...

ConMeC: A Dataset for Metonymy Resolution with Common Nouns

Saptarshi Ghosh, Tianyu Jiang

TL;DR

This work introduces ConMeC, a large, human-annotated dataset of 6,000 Wikipedia sentences targeting metonymy in common nouns, addressing a gap in prior datasets focused on named entities. It proposes a two-step, chain-of-thought prompting framework with category-dependent prompts and self-consistency to detect metonymy using large language models, and compares these methods against a fine-tuned BERT baseline. Experiments across ConMeC and three other datasets show that LLMs can achieve competitive performance on certain metonymy categories, but still struggle with nuanced semantic distinctions, while BERT remains strongest overall on ConMeC. The results also reveal insights into cross-category generalization, the impact of contextual information, and the benefits and limits of majority voting in LLM-based metonymy resolution. The dataset and methodology offer a foundation for future improvements in metonymy understanding and downstream NLP tasks that rely on implicit semantic relations.

Abstract

Metonymy plays an important role in our daily communication. People naturally think about things using their most salient properties or commonly related concepts. For example, by saying "The bus decided to skip our stop today," we actually mean that the bus driver made the decision, not the bus. Prior work on metonymy resolution has mainly focused on named entities. However, metonymy involving common nouns (such as desk, baby, and school) is also a frequent and challenging phenomenon. We argue that NLP systems should be capable of identifying the metonymic use of common nouns in context. We create a new metonymy dataset ConMeC, which consists of 6,000 sentences, where each sentence is paired with a target common noun and annotated by humans to indicate whether that common noun is used metonymically or not in that context. We also introduce a chain-of-thought based prompting method for detecting metonymy using large language models (LLMs). We evaluate our LLM-based pipeline, as well as a supervised BERT model on our dataset and three other metonymy datasets. Our experimental results demonstrate that LLMs could achieve performance comparable to the supervised BERT model on well-defined metonymy categories, while still struggling with instances requiring nuanced semantic understanding. Our dataset is publicly available at: https://github.com/SaptGhosh/ConMeC.

ConMeC: A Dataset for Metonymy Resolution with Common Nouns

TL;DR

This work introduces ConMeC, a large, human-annotated dataset of 6,000 Wikipedia sentences targeting metonymy in common nouns, addressing a gap in prior datasets focused on named entities. It proposes a two-step, chain-of-thought prompting framework with category-dependent prompts and self-consistency to detect metonymy using large language models, and compares these methods against a fine-tuned BERT baseline. Experiments across ConMeC and three other datasets show that LLMs can achieve competitive performance on certain metonymy categories, but still struggle with nuanced semantic distinctions, while BERT remains strongest overall on ConMeC. The results also reveal insights into cross-category generalization, the impact of contextual information, and the benefits and limits of majority voting in LLM-based metonymy resolution. The dataset and methodology offer a foundation for future improvements in metonymy understanding and downstream NLP tasks that rely on implicit semantic relations.

Abstract

Metonymy plays an important role in our daily communication. People naturally think about things using their most salient properties or commonly related concepts. For example, by saying "The bus decided to skip our stop today," we actually mean that the bus driver made the decision, not the bus. Prior work on metonymy resolution has mainly focused on named entities. However, metonymy involving common nouns (such as desk, baby, and school) is also a frequent and challenging phenomenon. We argue that NLP systems should be capable of identifying the metonymic use of common nouns in context. We create a new metonymy dataset ConMeC, which consists of 6,000 sentences, where each sentence is paired with a target common noun and annotated by humans to indicate whether that common noun is used metonymically or not in that context. We also introduce a chain-of-thought based prompting method for detecting metonymy using large language models (LLMs). We evaluate our LLM-based pipeline, as well as a supervised BERT model on our dataset and three other metonymy datasets. Our experimental results demonstrate that LLMs could achieve performance comparable to the supervised BERT model on well-defined metonymy categories, while still struggling with instances requiring nuanced semantic understanding. Our dataset is publicly available at: https://github.com/SaptGhosh/ConMeC.

Paper Structure

This paper contains 24 sections, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Process of dataset creation.
  • Figure 2: The architecture of our 2-step prompting method. We illustrates two examples. The LLM will first determine the semantic category of the target word in the sentence, such as container or location. Then, given the category dependent prompting, the model should predict whether there exists a metonymy use or not.
  • Figure 3: F1-scores across six categories among Llama (blue), GPT-4o (green) and BERT (yellow) on the metonymic sentences.