Table of Contents
Fetching ...

ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

Yuan Tian, Chuhan Zhang, Xiaotong Wang, Sitong Pan, Weiwei Cui, Haidong Zhang, Dazhen Deng, Yingcai Wu

TL;DR

ReSpark tackles the challenge of generating data reports by reusing analytical logic extracted from prior reports. It formalizes reports as sequences of analysis segments $S=igl\{s_0,s_1,\dots,s_N\bigr\}$ where each segment $s_j=(o_j,t_j,i_j)$ has an analytical objective $o_j$, a data transformation $t_j$, and an insight $i_j$, with dependencies $D$ linking segments. Using a ranking-based retrieval of reference reports, segmentation, and LLM-based adaptation, ReSpark progressively reconstructs and executes the reference workflow on a new dataset. An interactive interface supports real-time inspection, insertion, and editing of objectives, transformations, and content. Comparative and usability studies show that ReSpark improves progression logic, yields more coherent reports, and reduces user burden compared to baselines.

Abstract

Creating data reports is a labor-intensive task involving iterative data exploration, insight extraction, and narrative construction. A key challenge lies in composing the analysis logic-from defining objectives and transforming data to identifying and communicating insights. Manually crafting this logic can be cognitively demanding. While experienced analysts often reuse scripts from past projects, finding a perfect match for a new dataset is rare. Even when similar analyses are available online, they usually share only results or visualizations, not the underlying code, making reuse difficult. To address this, we present ReSpark, a system that leverages large language models (LLMs) to reverse-engineer analysis logic from existing reports and adapt it to new datasets. By generating draft analysis steps, ReSpark provides a warm start for users. It also supports interactive refinement, allowing users to inspect intermediate outputs, insert objectives, and revise content. We evaluate ReSpark through comparative and user studies, demonstrating its effectiveness in lowering the barrier to generating data reports without relying on existing analysis code.

ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

TL;DR

ReSpark tackles the challenge of generating data reports by reusing analytical logic extracted from prior reports. It formalizes reports as sequences of analysis segments where each segment has an analytical objective , a data transformation , and an insight , with dependencies linking segments. Using a ranking-based retrieval of reference reports, segmentation, and LLM-based adaptation, ReSpark progressively reconstructs and executes the reference workflow on a new dataset. An interactive interface supports real-time inspection, insertion, and editing of objectives, transformations, and content. Comparative and usability studies show that ReSpark improves progression logic, yields more coherent reports, and reduces user burden compared to baselines.

Abstract

Creating data reports is a labor-intensive task involving iterative data exploration, insight extraction, and narrative construction. A key challenge lies in composing the analysis logic-from defining objectives and transforming data to identifying and communicating insights. Manually crafting this logic can be cognitively demanding. While experienced analysts often reuse scripts from past projects, finding a perfect match for a new dataset is rare. Even when similar analyses are available online, they usually share only results or visualizations, not the underlying code, making reuse difficult. To address this, we present ReSpark, a system that leverages large language models (LLMs) to reverse-engineer analysis logic from existing reports and adapt it to new datasets. By generating draft analysis steps, ReSpark provides a warm start for users. It also supports interactive refinement, allowing users to inspect intermediate outputs, insert objectives, and revise content. We evaluate ReSpark through comparative and user studies, demonstrating its effectiveness in lowering the barrier to generating data reports without relying on existing analysis code.

Paper Structure

This paper contains 52 sections, 1 equation, 12 figures.

Figures (12)

  • Figure 1: Producing a data report (c1) involves analyzing the data (a) and summarizing the analyzed insights into a data report (b). Specifically, the data analysis workflow (a) includes a series of interdependent analysis segments (a1), each corresponding to an analytical objective, data transformations, and insights (a2). To reuse an existing report on a new dataset, we first deduce its data analysis workflow and reproduce it on the new data (c2).
  • Figure 2: The interface of ReSpark. ReSpark consists of four views: data view (b-c), dependency view (d-e), content view (f-g), and generation view (h-k). The data view displays the dataset description and field information. The dependency view displays the extracted interdependent analysis segments. The content view shows the analytical objective and content of the selected segment. The generation view shows real-time generated results.
  • Figure 3: A demonstration of the ranked reports and their summarized information.
  • Figure 4: The process of generating the second segment.
  • Figure 5: A demonstration of an analytical objective that fails to be corrected and needs to be removed.
  • ...and 7 more figures