CrossData: Leveraging Text-Data Connections for Authoring Data Documents

Chen Zhu-Tian; Haijun Xia

CrossData: Leveraging Text-Data Connections for Authoring Data Documents

Chen Zhu-Tian, Haijun Xia

TL;DR

CrossData tackles the pervasive burden of authoring data documents by introducing language-oriented text-data bindings and a Connection Engine that automatically infers, establishes, and maintains persistent bindings between narrative text and datasets. The prototype enables retrieval, computation, interactive exploration, and automatic consistency across text, tables, and charts, while readers inherit interactive documents. Technical evaluation shows the engine achieves 88.8% accuracy on 529 text-data connections with a 0.3s per-candidate time, and expert evaluation reports substantial reductions in manual effort and the emergence of new workflows bridging data exploration and writing. Together, these results demonstrate a practical pathway to closer integration of data analysis and textual narrative, with broad implications for data-driven domains and collaborative authoring.

Abstract

Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that their workflows contained numerous tedious, repetitive, and error-prone operations. The key issue that we identified is the lack of persistent connection between text and data. Thus, we developed CrossData, a prototype that treats text-data connections as persistent, interactive, first-class objects. By automatically identifying, establishing, and leveraging text-data connections, CrossData enables rich interactions to assist in the authoring of data documents. An expert evaluation with eight users demonstrated the usefulness of CrossData, showing that it not only reduced the manual effort in writing data documents but also opened new possibilities to bridge the gap between data exploration and writing.

CrossData: Leveraging Text-Data Connections for Authoring Data Documents

TL;DR

Abstract

Paper Structure (59 sections, 10 figures)

This paper contains 59 sections, 10 figures.

Introduction
RELATED WORK
Authoring Data-driven Content
Linking Text to Other Visual Media
Natural Language Interfaces for Data Queries and Visualization
FORMATIVE STUDY WITH PROFESSIONALS
Participants and Procedure
Findings and Discussion
Tedious and Frequent Data Retrieval (T1)
Inefficient and Error-prone Maintenance of Data Consistency (T2)
Significant Overhead for Iteration (T3)
Summary
CrossData
The CONNECTION ENGINE for text-data connections
Connections Between Text and Data
...and 44 more sections

Figures (10)

Figure 1: The connections between text and data. a) The dataset to report. b) Data phrases directly reporting the underlying data. c) A data phrase connecting with the data under the constraints of other phrases. The Blue text represents the keywords used to compute dependent phrases.
Figure 2: The pipeline to establish text-data connections. The Connection Engine takes a sentence as input and outputs a list of data phrase candidates. The user can select from the candidates to establish text-data connections.
Figure 3: An example detailing how the Connection Engine infers the data operations and suggests dependent data phrases. The engine first parses the sentence into a constituency tree, each of whose nodes represents text phrases (e.g., noun/verb/proposition phrase) in the sentence. Then, the engine infers and assembles data operations in a bottom-up order (a - c). The output of the operation in the root node is returned as suggested dependent data phrases.
Figure 4: Retrieving Data and Computing Values. a) A list of independent data phrases (highlighted by the cyan background) are retrieved and suggested for the user. b) The data mentioned in the sentence is highlighted. c) The mean score is computed and suggested as a dependent data phrase (highlighted by the orange background) for the user. Detail information about each suggestion is provided to assist in resolving ambiguities.
Figure 5: Using placeholders. a) There is not enough information provided in the sentence to calculate the difference between Jacob’s scores in different years. b) CrossData allows the user to use a Diff placeholder to indicate the computation. c) CrossData updates the placeholder as more information is provided.
...and 5 more figures

CrossData: Leveraging Text-Data Connections for Authoring Data Documents

TL;DR

Abstract

CrossData: Leveraging Text-Data Connections for Authoring Data Documents

Authors

TL;DR

Abstract

Table of Contents

Figures (10)