Specifying Genericity through Inclusiveness and Abstractness Continuous Scales
Claudia Collacciani, Andrea Amelio Ravelli, Marianna Marcella Bolognesi
TL;DR
This paper tackles the challenge of modeling NP genericity beyond binary classifications by introducing a two-dimensional, continuous annotation framework based on Inclusiveness and Abstractness. Grounded in linguistic theory, the approach is designed for crowd-sourcing and validated with a 324-sentence pilot dataset, showing that continuous ratings align with expert binary labels while capturing nuanced gradience. Logistic regression experiments demonstrate that INC and ABS (especially when used together) effectively predict GENERIC vs NON-GENERIC judgments, underscoring the added value of dual-dimensional annotations. The work provides a first openly accessible dataset and methodology to construct real-language resources for semantics research and commonsense knowledge repositories to enhance various NLP applications.
Abstract
This paper introduces a novel annotation framework for the fine-grained modeling of Noun Phrases' (NPs) genericity in natural language. The framework is designed to be simple and intuitive, making it accessible to non-expert annotators and suitable for crowd-sourced tasks. Drawing from theoretical and cognitive literature on genericity, this framework is grounded in established linguistic theory. Through a pilot study, we created a small but crucial annotated dataset of 324 sentences, serving as a foundation for future research. To validate our approach, we conducted an evaluation comparing our continuous annotations with existing binary annotations on the same dataset, demonstrating the framework's effectiveness in capturing nuanced aspects of genericity. Our work offers a practical resource for linguists, providing a first annotated dataset and an annotation scheme designed to build real-language datasets that can be used in studies on the semantics of genericity, and NLP practitioners, contributing to the development of commonsense knowledge repositories valuable in enhancing various NLP applications.
