Table of Contents
Fetching ...

Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Abhijnan Nath, Shadi Manafi, Avyakta Chelle, Nikhil Krishnaswamy

TL;DR

This work tackles event coreference resolution by leveraging abductive free-text rationales generated from an open-weight LLM as distant supervision. It introduces a two-stage training pipeline: Rationale-Oriented Event Clustering (ROEC) to align event pairs with generated rationales in a student encoder, and knowledge distillation to transfer rationale-informed cues from a teacher model into a compact model. Through experiments on ECB+, GVC, and AIDA Phase 1, the authors achieve state-of-the-art $B^3$ F1 on ECB+ and GVC and establish a new baseline on AIDA Phase 1, all without document clustering during inference. The approach demonstrates that LLM-generated rationales, when conditioned on gold labels and distilled into a smaller model, provide valuable contextual cues for coreference decisions, suggesting a practical path to more efficient, rationale-grounded ECR systems with potential applicability to other NLP tasks that benefit from explainable supervision.

Abstract

In NLP, Event Coreference Resolution (ECR) is the task of connecting event clusters that refer to the same underlying real-life event, usually via neural systems. In this work, we investigate using abductive free-text rationales (FTRs) generated by modern autoregressive LLMs as distant supervision of smaller student models for cross-document coreference (CDCR) of events. We implement novel rationale-oriented event clustering and knowledge distillation methods for event coreference scoring that leverage enriched information from the FTRs for improved CDCR without additional annotation or expensive document clustering. Our model using coreference specific knowledge distillation achieves SOTA B3 F1 on the ECB+ and GVC corpora and we establish a new baseline on the AIDA Phase 1 corpus. Our code can be found at https://github.com/csu-signal/llama_cdcr

Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

TL;DR

This work tackles event coreference resolution by leveraging abductive free-text rationales generated from an open-weight LLM as distant supervision. It introduces a two-stage training pipeline: Rationale-Oriented Event Clustering (ROEC) to align event pairs with generated rationales in a student encoder, and knowledge distillation to transfer rationale-informed cues from a teacher model into a compact model. Through experiments on ECB+, GVC, and AIDA Phase 1, the authors achieve state-of-the-art F1 on ECB+ and GVC and establish a new baseline on AIDA Phase 1, all without document clustering during inference. The approach demonstrates that LLM-generated rationales, when conditioned on gold labels and distilled into a smaller model, provide valuable contextual cues for coreference decisions, suggesting a practical path to more efficient, rationale-grounded ECR systems with potential applicability to other NLP tasks that benefit from explainable supervision.

Abstract

In NLP, Event Coreference Resolution (ECR) is the task of connecting event clusters that refer to the same underlying real-life event, usually via neural systems. In this work, we investigate using abductive free-text rationales (FTRs) generated by modern autoregressive LLMs as distant supervision of smaller student models for cross-document coreference (CDCR) of events. We implement novel rationale-oriented event clustering and knowledge distillation methods for event coreference scoring that leverage enriched information from the FTRs for improved CDCR without additional annotation or expensive document clustering. Our model using coreference specific knowledge distillation achieves SOTA B3 F1 on the ECB+ and GVC corpora and we establish a new baseline on the AIDA Phase 1 corpus. Our code can be found at https://github.com/csu-signal/llama_cdcr
Paper Structure (35 sections, 10 equations, 5 figures, 7 tables)

This paper contains 35 sections, 10 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Schematic system overview: Step-by-step FTRs resembling an "inner-monologue" are generated using an LLM (teacher model) conditioned on the gold coreference label. FTRs are then clustered along with event pairs to optimize the student's latent space (ROEC). The optimized student learns further coreference-specific contextual cues in the rationales from the teacher's latent space. Arrows show the gradient flow during training from the teacher (blue) and student (green) during the ROEC (dashed) and knowledge distillation (dotted) phases, respectively. Solid black line indicates inference samples, which include no rationale text or signal from the teacher model. Letters $a$-$i$ in the ROEC block represent distinct event mentions, and the colors represent an event cluster (such that all the blue circles cluster together). “R” represents a set of rationales that justify the linking of different mentions in a single cluster.
  • Figure 2: Prompt format for inner monologue-based FTR generation conditioned on the gold label (underlined). <m> and </m> demarcate the event triggers.
  • Figure 3: Distribution of mentions correctly resolved by the indicated model vs. cluster size.
  • Figure 4: Rationale sample presented to evaluators.
  • Figure 5: Average scores for inner monologue samples generated from ECB+ and GVC.