Table of Contents
Fetching ...

Generating event descriptions under syntactic and semantic constraints

Angela Cao, Faye Holt, Jonas Chan, Stephanie Richter, Lelia Glass, Aaron Steven White

TL;DR

This work evaluates three strategies for generating event descriptions under tight syntactic and semantic constraints: manual expert crafting, corpus-based sampling, and language-model-based sampling. It systematically compares these approaches across naturalness, typicality, and distinctiveness, using meticulously constructed verb triplets and calibration sets. The results show manual generation yields the most natural, typical, and distinctive items, while corpus- and LM-based methods produce high-quality outputs that are generally usable for downstream lexical-semantic annotation, provided downstream analyses tolerate modest degradation. The findings support using automated methods to enable scalable lexical semantic research, with LM-based approaches offering the best efficiency overall, and point to future work on scaling, constraint complexity, and post-editing to close the gap with human-generated data.

Abstract

With the goal of supporting scalable lexical semantic annotation, analysis, and theorizing, we conduct a comprehensive evaluation of different methods for generating event descriptions under both syntactic constraints -- e.g. desired clause structure -- and semantic constraints -- e.g. desired verb sense. We compare three different methods -- (i) manual generation by experts; (ii) sampling from a corpus annotated for syntactic and semantic information; and (iii) sampling from a language model (LM) conditioned on syntactic and semantic information -- along three dimensions of the generated event descriptions: (a) naturalness, (b) typicality, and (c) distinctiveness. We find that all methods reliably produce natural, typical, and distinctive event descriptions, but that manual generation continues to produce event descriptions that are more natural, typical, and distinctive than the automated generation methods. We conclude that the automated methods we consider produce event descriptions of sufficient quality for use in downstream annotation and analysis insofar as the methods used for this annotation and analysis are robust to a small amount of degradation in the resulting event descriptions.

Generating event descriptions under syntactic and semantic constraints

TL;DR

This work evaluates three strategies for generating event descriptions under tight syntactic and semantic constraints: manual expert crafting, corpus-based sampling, and language-model-based sampling. It systematically compares these approaches across naturalness, typicality, and distinctiveness, using meticulously constructed verb triplets and calibration sets. The results show manual generation yields the most natural, typical, and distinctive items, while corpus- and LM-based methods produce high-quality outputs that are generally usable for downstream lexical-semantic annotation, provided downstream analyses tolerate modest degradation. The findings support using automated methods to enable scalable lexical semantic research, with LM-based approaches offering the best efficiency overall, and point to future work on scaling, constraint complexity, and post-editing to close the gap with human-generated data.

Abstract

With the goal of supporting scalable lexical semantic annotation, analysis, and theorizing, we conduct a comprehensive evaluation of different methods for generating event descriptions under both syntactic constraints -- e.g. desired clause structure -- and semantic constraints -- e.g. desired verb sense. We compare three different methods -- (i) manual generation by experts; (ii) sampling from a corpus annotated for syntactic and semantic information; and (iii) sampling from a language model (LM) conditioned on syntactic and semantic information -- along three dimensions of the generated event descriptions: (a) naturalness, (b) typicality, and (c) distinctiveness. We find that all methods reliably produce natural, typical, and distinctive event descriptions, but that manual generation continues to produce event descriptions that are more natural, typical, and distinctive than the automated generation methods. We conclude that the automated methods we consider produce event descriptions of sufficient quality for use in downstream annotation and analysis insofar as the methods used for this annotation and analysis are robust to a small amount of degradation in the resulting event descriptions.

Paper Structure

This paper contains 59 sections, 3 equations, 4 figures, 16 tables.

Figures (4)

  • Figure 1: Distribution of naturalness scores for manually generated sentences.
  • Figure 2: Mean naturalness rating for each sentence produced by each generation method. Each black point shows the mean rating of a sentence and large colored points show the mean of those means for each generation method.
  • Figure 3: Mean typicality rating for each sentence from each source. Each black point shows the mean rating of a sentence and large colored points show the mean of those means for each generation method.
  • Figure 4: Mean difference rating for each sentence pair from each source. Each point shows the mean rating of a sentence pair and large colored points show the mean of those means for each relevant comparison type.