FACTTRACK: Time-Aware World State Tracking in Story Outlines
Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Daniel Klein
TL;DR
FactTrack introduces time-aware atomic facts and a timeline-based world state to detect and resolve factual contradictions in long-form generation. By decomposing events into pre-facts and post-facts and updating a world state via a four-step pipeline (Decompose, Determine Validity Interval, Detect Contradictions, Update), the method maintains temporal validity intervals and facilitates post-hoc correction. Empirical results on structured story outlines show FactTrack with LLaMA-7B-Chat performs on par with GPT-4 baselines and outperforms other baselines, with GPT-4 further enhancing performance. The approach offers a generalizable framework for maintaining factual consistency over extended contexts and has potential applications in evaluating and ensuring reliability in long-form content beyond storytelling, including knowledge-base updates and misinformation detection.
Abstract
While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over time. At a high level, FACTTRACK consists of a four-step pipeline to update a world state data structure for each new event: (1) decompose the event into directional atomic facts; (2) determine the validity interval of each atomic fact using the world state; (3) detect contradictions with existing facts in the world state; and finally (4) add new facts to the world state and update existing atomic facts. When we apply FACTTRACK to contradiction detection on structured story outlines, we find that FACTTRACK using LLaMA2-7B-Chat substantially outperforms a fair baseline using LLaMA2-7B-Chat, and achieves performance comparable to a GPT4 baseline. Moreover, when using GPT4, FACTTRACK significantly outperforms the GPT4 baseline.
