Table of Contents
Fetching ...

SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

Sai Vallurupalli, Katrin Erk, Francis Ferraro

TL;DR

SAGA introduces a participant-centric framework for goal reasoning in complex narratives, building a large, crowd-sourced dataset of 6.3K annotated goal and action instances across 886 actual and 951 alternative ROCStories-derived narratives with 80% average IAA. The authors formulate five inference tasks—ranging from goal inference to plan and achievement—in order to benchmark multiple LLMs (GPT-4/3.5, Flan-T5, T5) under zero- and few-shot settings and with fine-tuning on SAGA data. Results show larger pretrained models generally outperform smaller ones but fine-tuned smaller models can surpass or match larger models on several tasks, particularly when augmented with few-shot prompts. The work demonstrates that while big models capture some goal-based knowledge, finer-grained, participant-specific goal reasoning and prospective planning benefit substantially from curated datasets and targeted fine-tuning, with implications for claim verification, question answering, and narrative understanding.

Abstract

Interpreting and assessing goal driven actions is vital to understanding and reasoning over complex events. It is important to be able to acquire the knowledge needed for this understanding, though doing so is challenging. We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative according to the intended achievements of the participants in that narrative, the likely future actions of the participants, and the likelihood of goal success. We collect 6.3K high quality goal and action annotations reflecting our proposed participant achievement lens, with an average weighted Fleiss-Kappa IAA of 80%. Our collection contains annotated alternate versions of each narrative. These alternate versions vary minimally from the "original" story, but can license drastically different inferences. Our findings suggest that while modern large language models can reflect some of the goal-based knowledge we study, they find it challenging to fully capture the design and intent behind concerted actions, even when the model pretraining included the data from which we extracted the goal knowledge. We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.

SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

TL;DR

SAGA introduces a participant-centric framework for goal reasoning in complex narratives, building a large, crowd-sourced dataset of 6.3K annotated goal and action instances across 886 actual and 951 alternative ROCStories-derived narratives with 80% average IAA. The authors formulate five inference tasks—ranging from goal inference to plan and achievement—in order to benchmark multiple LLMs (GPT-4/3.5, Flan-T5, T5) under zero- and few-shot settings and with fine-tuning on SAGA data. Results show larger pretrained models generally outperform smaller ones but fine-tuned smaller models can surpass or match larger models on several tasks, particularly when augmented with few-shot prompts. The work demonstrates that while big models capture some goal-based knowledge, finer-grained, participant-specific goal reasoning and prospective planning benefit substantially from curated datasets and targeted fine-tuning, with implications for claim verification, question answering, and narrative understanding.

Abstract

Interpreting and assessing goal driven actions is vital to understanding and reasoning over complex events. It is important to be able to acquire the knowledge needed for this understanding, though doing so is challenging. We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative according to the intended achievements of the participants in that narrative, the likely future actions of the participants, and the likelihood of goal success. We collect 6.3K high quality goal and action annotations reflecting our proposed participant achievement lens, with an average weighted Fleiss-Kappa IAA of 80%. Our collection contains annotated alternate versions of each narrative. These alternate versions vary minimally from the "original" story, but can license drastically different inferences. Our findings suggest that while modern large language models can reflect some of the goal-based knowledge we study, they find it challenging to fully capture the design and intent behind concerted actions, even when the model pretraining included the data from which we extracted the goal knowledge. We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.
Paper Structure (46 sections, 5 figures, 25 tables)

This paper contains 46 sections, 5 figures, 25 tables.

Figures (5)

  • Figure 1: A participant's goal inferred from the actual story when applied to 3 alternative stories, drawn from the PASTA dataset ghosh2023pasta; slightly varying actions in the stories lead to different goal achievement outcomes.
  • Figure 2: Goal reasoning inferences from our dataset formulated as benchmarking tasks. These consist of both generating a participant's goal and future actions after the story aimed at goal achievement and identifying goal applicability and achievement. Tasks 1, 3a and 4 examine the generative understanding of goals, explainable future actions and plans. Tasks 2, 3b, 3c and 5 examine discriminative understanding of applicability and achievement.
  • Figure 3: Our modeling of goals for "the kids." In several stages a highlighted participant-specific story is annotated starting with the actual story (left) and an alternate story (right) resulting in a goal annotation set consisting of both free-form text and label assignments.
  • Figure 4: HIT General instructions for goal annotation in actual stories.
  • Figure 5: Screenshot of HIT used in the goal annotation in actual stories.