NoteFlow: Recommending Charts as Sight Glasses for Tracing Data Flow in Computational Notebooks
Yuan Tian, Dazhen Deng, Sen Yang, Huawei Zheng, Bowen Shi, Kai Xiong, Xinjing Yi, Yingcai Wu
TL;DR
The paper tackles tracing data flow during Exploratory Data Analysis in computational notebooks by building a NoteFlow framework that constructs a data-flow graph from notebook cells, automatically recommends charts, and provides consistent chart-based tracing across the flow to maintain coherence. It introduces three modules—Flow Parsing, Chart Recommendation, and Chart Tracing—relying on transformation inference and a rules-based chart-generation process conditioned on data facts (distribution, correlation, and trend). In a user study against LUX, NoteFlow enables complete recall of anomalous code lines and improves understanding of global data changes, indicating higher efficiency and better traceability for EDA tasks. The work demonstrates that visual sight glasses of tables can substantially reduce cognitive load in notebook-based analytics and outlines avenues for richer visualizations and real-time streaming support in future work.
Abstract
Exploratory Data Analysis (EDA) is a routine task for data analysts, often conducted using flexible computational notebooks. During EDA, data workers process, visualize, and interpret data tables, making decisions about subsequent analysis. However, the cell-by-cell programming approach, while flexible, can lead to disorganized code, making it difficult to trace the state of data tables across cells and increasing the cognitive load on data workers. This paper introduces NoteFlow, a notebook library that recommends charts as ``sight glasses'' for data tables, allowing users to monitor their dynamic updates throughout the EDA process. To ensure visual consistency and effectiveness, NoteFlow adapts chart encodings in response to data transformations, maintaining a coherent and insightful representation of the data. The proposed method was evaluated through user studies, demonstrating its ability to provide an overview of the EDA process and convey critical insights in the data tables.
