Annotating FrameNet via Structure-Conditioned Language Generation

Xinyue Cui; Swabha Swayamdipta

Annotating FrameNet via Structure-Conditioned Language Generation

Xinyue Cui, Swabha Swayamdipta

TL;DR

The task of generating new sentences preserving a given semantic structure, following the FrameNet formalism is investigated, and a framework to produce novel frame-semantically annotated sentences following an overgenerate-and-filter approach is proposed.

Abstract

Despite the remarkable generative capabilities of language models in producing naturalistic language, their effectiveness on explicit manipulation and generation of linguistic structures remain understudied. In this paper, we investigate the task of generating new sentences preserving a given semantic structure, following the FrameNet formalism. We propose a framework to produce novel frame-semantically annotated sentences following an overgenerate-and-filter approach. Our results show that conditioning on rich, explicit semantic information tends to produce generations with high human acceptance, under both prompting and finetuning. Our generated frame-semantic structured annotations are effective at training data augmentation for frame-semantic role labeling in low-resource settings; however, we do not see benefits under higher resource settings. Our study concludes that while generating high-quality, semantically rich data might be within reach, the downstream utility of such generations remains to be seen, highlighting the outstanding challenges with automating linguistic annotation tasks.

Annotating FrameNet via Structure-Conditioned Language Generation

TL;DR

Abstract

Paper Structure (30 sections, 2 figures, 13 tables)

This paper contains 30 sections, 2 figures, 13 tables.

Introduction
FrameNet and Extensions
Sister LU Replacement
Generating FrameNet Annotations via Frame-Semantic Conditioning
Selecting Candidate FEs for Generation
Generating Semantically Consistent Spans
Filtering Inconsistent Generations
Intrinsic Evaluation of Generations
Augmenting Data for Frame-SRL
Augmenting Under Low-Resource Setting
Related Work
Data Augmentation for FrameNet
Controlled Generation
Conclusion
FrameNet Statistics
...and 15 more sections

Figures (2)

Figure 1: Our framework to generate frame semantic annotated data. Following Pancholy2021SisterHD, we replace a sister LU with the target LU in an annotated sentence (0;§\ref{['sec:sister-replace']}). We select FEs appropriate for generating a new structure-annotated sentence (1;§\ref{['sec:method-masking']}), and execute generation via fine-tuning T5 or prompting GPT-4 (2;§\ref{['sec:method-generation']}). Finally, we filter out sentences that fail to preserve LU-FE relationships under FrameNet (3;§\ref{['sec:method-filtering']}).
Figure 2: Learning curves for our frame-SRL model and Lin2021AGN's end-to-end parser show diminishing returns on adding more human-annotated training data. The triangle marker denotes the performance of Lin2021AGN's parser on SRL with gold frame and LU.

Annotating FrameNet via Structure-Conditioned Language Generation

TL;DR

Abstract

Annotating FrameNet via Structure-Conditioned Language Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)