Table of Contents
Fetching ...

NoteFlow: Recommending Charts as Sight Glasses for Tracing Data Flow in Computational Notebooks

Yuan Tian, Dazhen Deng, Sen Yang, Huawei Zheng, Bowen Shi, Kai Xiong, Xinjing Yi, Yingcai Wu

TL;DR

The paper tackles tracing data flow during Exploratory Data Analysis in computational notebooks by building a NoteFlow framework that constructs a data-flow graph from notebook cells, automatically recommends charts, and provides consistent chart-based tracing across the flow to maintain coherence. It introduces three modules—Flow Parsing, Chart Recommendation, and Chart Tracing—relying on transformation inference and a rules-based chart-generation process conditioned on data facts (distribution, correlation, and trend). In a user study against LUX, NoteFlow enables complete recall of anomalous code lines and improves understanding of global data changes, indicating higher efficiency and better traceability for EDA tasks. The work demonstrates that visual sight glasses of tables can substantially reduce cognitive load in notebook-based analytics and outlines avenues for richer visualizations and real-time streaming support in future work.

Abstract

Exploratory Data Analysis (EDA) is a routine task for data analysts, often conducted using flexible computational notebooks. During EDA, data workers process, visualize, and interpret data tables, making decisions about subsequent analysis. However, the cell-by-cell programming approach, while flexible, can lead to disorganized code, making it difficult to trace the state of data tables across cells and increasing the cognitive load on data workers. This paper introduces NoteFlow, a notebook library that recommends charts as ``sight glasses'' for data tables, allowing users to monitor their dynamic updates throughout the EDA process. To ensure visual consistency and effectiveness, NoteFlow adapts chart encodings in response to data transformations, maintaining a coherent and insightful representation of the data. The proposed method was evaluated through user studies, demonstrating its ability to provide an overview of the EDA process and convey critical insights in the data tables.

NoteFlow: Recommending Charts as Sight Glasses for Tracing Data Flow in Computational Notebooks

TL;DR

The paper tackles tracing data flow during Exploratory Data Analysis in computational notebooks by building a NoteFlow framework that constructs a data-flow graph from notebook cells, automatically recommends charts, and provides consistent chart-based tracing across the flow to maintain coherence. It introduces three modules—Flow Parsing, Chart Recommendation, and Chart Tracing—relying on transformation inference and a rules-based chart-generation process conditioned on data facts (distribution, correlation, and trend). In a user study against LUX, NoteFlow enables complete recall of anomalous code lines and improves understanding of global data changes, indicating higher efficiency and better traceability for EDA tasks. The work demonstrates that visual sight glasses of tables can substantially reduce cognitive load in notebook-based analytics and outlines avenues for richer visualizations and real-time streaming support in future work.

Abstract

Exploratory Data Analysis (EDA) is a routine task for data analysts, often conducted using flexible computational notebooks. During EDA, data workers process, visualize, and interpret data tables, making decisions about subsequent analysis. However, the cell-by-cell programming approach, while flexible, can lead to disorganized code, making it difficult to trace the state of data tables across cells and increasing the cognitive load on data workers. This paper introduces NoteFlow, a notebook library that recommends charts as ``sight glasses'' for data tables, allowing users to monitor their dynamic updates throughout the EDA process. To ensure visual consistency and effectiveness, NoteFlow adapts chart encodings in response to data transformations, maintaining a coherent and insightful representation of the data. The proposed method was evaluated through user studies, demonstrating its ability to provide an overview of the EDA process and convey critical insights in the data tables.

Paper Structure

This paper contains 32 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Motivating scenario. The EDA process will encounter a series of challenges that require tedious efforts on programming when conducting data profiling (A), data cleaning (B), data analysis and error discovery (C), and locating error codes (D).
  • Figure 2: The Interface of NoteFlow. NoteFlow contains a chart view showing the recommended charts under each cell (A & B), which can be opened or collapsed. Users can select a chart of interest for detail tracing across the flow. A flow view is shown on the right, demonstrating the relationships between data tables, with the chart selected and column details (C). A pinned view is situated on the top right of the interface, showing the charts that have been selected. The interface contains a control panel for switching the variables to be traced and downloading, uploading, and re-running the results.
  • Figure 3: The stepped layout after re-running some of the cells. The cells before (A) and after (B) the re-running correspond to the flow on the left column (C) and the right one (D). When NoteFlow detects a re-running, a new column will be appended on the right, showing the re-running cell and the succeeding ones. A link will connect the nodes of different versions (E) for better understanding.
  • Figure 4: The recall rate and completion time of T1 using LUX and NoteFlow.
  • Figure 5: The Result of the Likert Scale Questions in the Post-Study Questionnaire.