ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

Yuan Tian; Chuhan Zhang; Xiaotong Wang; Sitong Pan; Weiwei Cui; Haidong Zhang; Dazhen Deng; Yingcai Wu

ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

Yuan Tian, Chuhan Zhang, Xiaotong Wang, Sitong Pan, Weiwei Cui, Haidong Zhang, Dazhen Deng, Yingcai Wu

TL;DR

ReSpark tackles the challenge of generating data reports by reusing analytical logic extracted from prior reports. It formalizes reports as sequences of analysis segments $S=igl\{s_0,s_1,\dots,s_N\bigr\}$ where each segment $s_j=(o_j,t_j,i_j)$ has an analytical objective $o_j$, a data transformation $t_j$, and an insight $i_j$, with dependencies $D$ linking segments. Using a ranking-based retrieval of reference reports, segmentation, and LLM-based adaptation, ReSpark progressively reconstructs and executes the reference workflow on a new dataset. An interactive interface supports real-time inspection, insertion, and editing of objectives, transformations, and content. Comparative and usability studies show that ReSpark improves progression logic, yields more coherent reports, and reduces user burden compared to baselines.

Abstract

Creating data reports is a labor-intensive task involving iterative data exploration, insight extraction, and narrative construction. A key challenge lies in composing the analysis logic-from defining objectives and transforming data to identifying and communicating insights. Manually crafting this logic can be cognitively demanding. While experienced analysts often reuse scripts from past projects, finding a perfect match for a new dataset is rare. Even when similar analyses are available online, they usually share only results or visualizations, not the underlying code, making reuse difficult. To address this, we present ReSpark, a system that leverages large language models (LLMs) to reverse-engineer analysis logic from existing reports and adapt it to new datasets. By generating draft analysis steps, ReSpark provides a warm start for users. It also supports interactive refinement, allowing users to inspect intermediate outputs, insert objectives, and revise content. We evaluate ReSpark through comparative and user studies, demonstrating its effectiveness in lowering the barrier to generating data reports without relying on existing analysis code.

ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

TL;DR

Abstract

ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)