DataScout: Automatic Data Fact Retrieval for Statement Augmentation with an LLM-Based Agent
Chuer Chen, Yuqi Liu, Danqing Shi, Shixiong Cao, Nan Cao
TL;DR
DataScout tackles the time-intensive challenge of locating data facts to augment data-driven narratives by introducing an LLM-based agent that collaboratively constructs a retrieval tree. The system decomposes queries, searches data via text-to-SQL, and extracts stance-aligned facts using Chain-of-Thought prompting, all visualized in a mind-map retrieval space to support human-AI collaboration. It is evaluated through a formative study and expert interviews, plus three case studies using World Development Indicators, demonstrating meaningful gains in retrieving diverse, stance-aware data facts to bolster credibility and objectivity. The work highlights practical implications for integrating LLMs into data storytelling workflows and points to future improvements in accuracy, data sourcing, visualization diversity, and broader evaluation.
Abstract
A data story typically integrates data facts from multiple perspectives and stances to construct a comprehensive and objective narrative. However, retrieving these facts demands time for data search and challenges the creator's analytical skills. In this work, we introduce DataScout, an interactive system that automatically performs reasoning and stance-based data facts retrieval to augment the user's statement. Particularly, DataScout leverages an LLM-based agent to construct a retrieval tree, enabling collaborative control of its expansion between users and the agent. The interface visualizes the retrieval tree as a mind map that eases users to intuitively steer the retrieval direction and effectively engage in reasoning and analysis. We evaluate the proposed system through case studies and in-depth expert interviews. Our evaluation demonstrates that DataScout can effectively retrieve multifaceted data facts from different stances, helping users verify their statements and enhance the credibility of their stories.
