Table of Contents
Fetching ...

Coarse-Grained Sense Inventories Based on Semantic Matching between English Dictionaries

Masato Kikuchi, Masatsugu Ono, Toshioki Soga, Tetsu Tanabe, Tadachika Ozono

TL;DR

The paper tackles the problem of WordNet's overly fine-grained senses by constructing coarse-grained inventories through automated semantic matching between WordNet senses and Cambridge dictionaries (CLD and CED). Using 15,885 target words and a prompt-based LLM approach, it generates two inventories containing 3,222 (CLD) and 9,457 (CED) sense groups, each annotated with CEFR levels. Experiments show these inventories yield higher intra-group coherence than the existing Coarse Sense Inventory (CSI), and WSD-style evaluation suggests improvements in practical accuracy (~85–86% with these inventories) while avoiding reliance on large corpora. The resources provide scalable, education-friendly sense groupings with broad applicability to NLP tasks and language learning, and are publicly available for further refinement.CEFR labeling and dictionary-based grouping constitute the paper’s key contributions, offering a reproducible, automated path to coarser-grained semantic representations.

Abstract

WordNet is one of the largest handcrafted concept dictionaries visualizing word connections through semantic relationships. It is widely used as a word sense inventory in natural language processing tasks. However, WordNet's fine-grained senses have been criticized for limiting its usability. In this paper, we semantically match sense definitions from Cambridge dictionaries and WordNet and develop new coarse-grained sense inventories. We verify the effectiveness of our inventories by comparing their semantic coherences with that of Coarse Sense Inventory. The advantages of the proposed inventories include their low dependency on large-scale resources, better aggregation of closely related senses, CEFR-level assignments, and ease of expansion and improvement.

Coarse-Grained Sense Inventories Based on Semantic Matching between English Dictionaries

TL;DR

The paper tackles the problem of WordNet's overly fine-grained senses by constructing coarse-grained inventories through automated semantic matching between WordNet senses and Cambridge dictionaries (CLD and CED). Using 15,885 target words and a prompt-based LLM approach, it generates two inventories containing 3,222 (CLD) and 9,457 (CED) sense groups, each annotated with CEFR levels. Experiments show these inventories yield higher intra-group coherence than the existing Coarse Sense Inventory (CSI), and WSD-style evaluation suggests improvements in practical accuracy (~85–86% with these inventories) while avoiding reliance on large corpora. The resources provide scalable, education-friendly sense groupings with broad applicability to NLP tasks and language learning, and are publicly available for further refinement.CEFR labeling and dictionary-based grouping constitute the paper’s key contributions, offering a reproducible, automated path to coarser-grained semantic representations.

Abstract

WordNet is one of the largest handcrafted concept dictionaries visualizing word connections through semantic relationships. It is widely used as a word sense inventory in natural language processing tasks. However, WordNet's fine-grained senses have been criticized for limiting its usability. In this paper, we semantically match sense definitions from Cambridge dictionaries and WordNet and develop new coarse-grained sense inventories. We verify the effectiveness of our inventories by comparing their semantic coherences with that of Coarse Sense Inventory. The advantages of the proposed inventories include their low dependency on large-scale resources, better aggregation of closely related senses, CEFR-level assignments, and ease of expansion and improvement.
Paper Structure (10 sections, 6 figures, 2 tables)

This paper contains 10 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An example of WSD. The target word "say" for WSD is in bold. The correct sense is assigned to (1), but (2) could also be considered correct. However, the system must choose only one sense.
  • Figure 2: Prompt template used for word sense matching
  • Figure 3: Sense matching examples (top) and coarse-grained sense representations (bottom) constructed using them
  • Figure 4: Template (top) and example (bottom) of the prompt used in our experiments. In this example, WORD is "say," and the sense definitions of $s$ and $s^{\prime}$ are "utter aloud" and "express in words," respectively.
  • Figure 5: Example showing one sense definition in CLD matching two WordNet definitions with different meanings. All definitions are of the word "find."
  • ...and 1 more figures