Table of Contents
Fetching ...

FACTTRACK: Time-Aware World State Tracking in Story Outlines

Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Daniel Klein

TL;DR

FactTrack introduces time-aware atomic facts and a timeline-based world state to detect and resolve factual contradictions in long-form generation. By decomposing events into pre-facts and post-facts and updating a world state via a four-step pipeline (Decompose, Determine Validity Interval, Detect Contradictions, Update), the method maintains temporal validity intervals and facilitates post-hoc correction. Empirical results on structured story outlines show FactTrack with LLaMA-7B-Chat performs on par with GPT-4 baselines and outperforms other baselines, with GPT-4 further enhancing performance. The approach offers a generalizable framework for maintaining factual consistency over extended contexts and has potential applications in evaluating and ensuring reliability in long-form content beyond storytelling, including knowledge-base updates and misinformation detection.

Abstract

While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over time. At a high level, FACTTRACK consists of a four-step pipeline to update a world state data structure for each new event: (1) decompose the event into directional atomic facts; (2) determine the validity interval of each atomic fact using the world state; (3) detect contradictions with existing facts in the world state; and finally (4) add new facts to the world state and update existing atomic facts. When we apply FACTTRACK to contradiction detection on structured story outlines, we find that FACTTRACK using LLaMA2-7B-Chat substantially outperforms a fair baseline using LLaMA2-7B-Chat, and achieves performance comparable to a GPT4 baseline. Moreover, when using GPT4, FACTTRACK significantly outperforms the GPT4 baseline.

FACTTRACK: Time-Aware World State Tracking in Story Outlines

TL;DR

FactTrack introduces time-aware atomic facts and a timeline-based world state to detect and resolve factual contradictions in long-form generation. By decomposing events into pre-facts and post-facts and updating a world state via a four-step pipeline (Decompose, Determine Validity Interval, Detect Contradictions, Update), the method maintains temporal validity intervals and facilitates post-hoc correction. Empirical results on structured story outlines show FactTrack with LLaMA-7B-Chat performs on par with GPT-4 baselines and outperforms other baselines, with GPT-4 further enhancing performance. The approach offers a generalizable framework for maintaining factual consistency over extended contexts and has potential applications in evaluating and ensuring reliability in long-form content beyond storytelling, including knowledge-base updates and misinformation detection.

Abstract

While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over time. At a high level, FACTTRACK consists of a four-step pipeline to update a world state data structure for each new event: (1) decompose the event into directional atomic facts; (2) determine the validity interval of each atomic fact using the world state; (3) detect contradictions with existing facts in the world state; and finally (4) add new facts to the world state and update existing atomic facts. When we apply FACTTRACK to contradiction detection on structured story outlines, we find that FACTTRACK using LLaMA2-7B-Chat substantially outperforms a fair baseline using LLaMA2-7B-Chat, and achieves performance comparable to a GPT4 baseline. Moreover, when using GPT4, FACTTRACK significantly outperforms the GPT4 baseline.
Paper Structure (52 sections, 4 equations, 7 figures, 17 tables, 3 algorithms)

This paper contains 52 sections, 4 equations, 7 figures, 17 tables, 3 algorithms.

Figures (7)

  • Figure 1: FactTrack tackles the problems of factual inconsistency and plot redundancy. Note that those problems are based on our observations, to provide a clearer understanding of the problem, with both issues considered together in our pipeline and evaluation. For factual inconsistency detection, FactTrack tracks a validity interval for each fact to distinguish legitimate contradictions from facts simply changing over time. For plot redundancy detection, our method can represent the timeline in more structured form, making detection easier.
  • Figure 2: Decomposition of an event, to verify and update the world state for the event. Moving forward on the arrow of time in this example, we first retrieve all facts corresponding to pre-facts from the world state to check for any conflicting fact pairs (Verification). We then replace any fact in the world state that contradicts a post-fact with the corresponding new post-fact (Update).
  • Figure 3: The timeline of narration. The start time and end time of any given event can be split recursively into sub-events. pre-facts begin at the left boundary and point to the left; post-facts begin at the right boundary and point to the right.
  • Figure 4: Five possible situations for a pre-fact and post-fact contradicting each other on different points or intervals, depending on their respective validity intervals. In our implementation, we only flag a contradiction when a contradiction is detected on both checkpoints (the last situation) to maximize confidence in our predictions.
  • Figure 5: The general pipeline for how we maintain our data structure. We begin with a new event (e.g., plot point in a story outline), which we decompose into several pre-facts and post-facts (Decompose Events). For each fact, we determine its validity interval based on the world state (Determine Validity Interval), and then detect any contradictions with existing facts in the world state (Detect Contradictions). If the fact does not contradict any existing fact in the world state, then we update the world state with the new fact (Update World State). Otherwise, we write down details about the contradiction, and rewrite the new event conditioned on the preexisting event and details about the contradiction. Note that Determine Validity Interval and Update World State are only between facts in the same direction, while Detect Contradictions are only between facts in different directions.
  • ...and 2 more figures