CrossData: Leveraging Text-Data Connections for Authoring Data Documents
Chen Zhu-Tian, Haijun Xia
TL;DR
CrossData tackles the pervasive burden of authoring data documents by introducing language-oriented text-data bindings and a Connection Engine that automatically infers, establishes, and maintains persistent bindings between narrative text and datasets. The prototype enables retrieval, computation, interactive exploration, and automatic consistency across text, tables, and charts, while readers inherit interactive documents. Technical evaluation shows the engine achieves 88.8% accuracy on 529 text-data connections with a 0.3s per-candidate time, and expert evaluation reports substantial reductions in manual effort and the emergence of new workflows bridging data exploration and writing. Together, these results demonstrate a practical pathway to closer integration of data analysis and textual narrative, with broad implications for data-driven domains and collaborative authoring.
Abstract
Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that their workflows contained numerous tedious, repetitive, and error-prone operations. The key issue that we identified is the lack of persistent connection between text and data. Thus, we developed CrossData, a prototype that treats text-data connections as persistent, interactive, first-class objects. By automatically identifying, establishing, and leveraging text-data connections, CrossData enables rich interactions to assist in the authoring of data documents. An expert evaluation with eight users demonstrated the usefulness of CrossData, showing that it not only reduced the manual effort in writing data documents but also opened new possibilities to bridge the gap between data exploration and writing.
