Table of Contents
Fetching ...

Enhancing Computational Notebooks with Code+Data Space Versioning

Hanxi Fang, Supawit Chockchowwat, Hari Sundaram, Yongjoo Park

TL;DR

The paper identifies a fundamental mismatch between nonlinear data exploration and traditional notebook workflows, and introduces Kishuboard, a system implementing two-dimensional code+data space versioning. It formalizes a data model and safe checkout rules to ensure consistent code+data states while enabling execution rollbacks and code+data checkouts, complemented by a one-dimensional commit history graph for usability. An end-to-end prototype integrated with Jupyter demonstrates real-time synchronization and supports use-case driven workflows, with a human-subject study showing productivity gains, especially for compute-intensive tasks, and high user acceptance. The work outlines practical implications for interactive data science, including intuitive UI design, automatic folding, and variable-focused diff mechanisms, laying groundwork for broader adoption and future work in collaboration, semantic search, and scalable comparison across explorations.

Abstract

There is a gap between how people explore data and how Jupyter-like computational notebooks are designed. People explore data nonlinearly, using execution undos, branching, and/or complete reverts, whereas notebooks are designed for sequential exploration. Recent works like ForkIt are still insufficient to support these multiple modes of nonlinear exploration in a unified way. In this work, we address the challenge by introducing two-dimensional code+data space versioning for computational notebooks and verifying its effectiveness using our prototype system, Kishuboard, which integrates with Jupyter. By adjusting code and data knobs, users of Kishuboard can intuitively manage the state of computational notebooks in a flexible way, thereby achieving both execution rollbacks and checkouts across complex multi-branch exploration history. Moreover, this two-dimensional versioning mechanism can easily be presented along with a friendly one-dimensional history. Human subject studies indicate that Kishuboard significantly enhances user productivity in various data science tasks.

Enhancing Computational Notebooks with Code+Data Space Versioning

TL;DR

The paper identifies a fundamental mismatch between nonlinear data exploration and traditional notebook workflows, and introduces Kishuboard, a system implementing two-dimensional code+data space versioning. It formalizes a data model and safe checkout rules to ensure consistent code+data states while enabling execution rollbacks and code+data checkouts, complemented by a one-dimensional commit history graph for usability. An end-to-end prototype integrated with Jupyter demonstrates real-time synchronization and supports use-case driven workflows, with a human-subject study showing productivity gains, especially for compute-intensive tasks, and high user acceptance. The work outlines practical implications for interactive data science, including intuitive UI design, automatic folding, and variable-focused diff mechanisms, laying groundwork for broader adoption and future work in collaboration, semantic search, and scalable comparison across explorations.

Abstract

There is a gap between how people explore data and how Jupyter-like computational notebooks are designed. People explore data nonlinearly, using execution undos, branching, and/or complete reverts, whereas notebooks are designed for sequential exploration. Recent works like ForkIt are still insufficient to support these multiple modes of nonlinear exploration in a unified way. In this work, we address the challenge by introducing two-dimensional code+data space versioning for computational notebooks and verifying its effectiveness using our prototype system, Kishuboard, which integrates with Jupyter. By adjusting code and data knobs, users of Kishuboard can intuitively manage the state of computational notebooks in a flexible way, thereby achieving both execution rollbacks and checkouts across complex multi-branch exploration history. Moreover, this two-dimensional versioning mechanism can easily be presented along with a friendly one-dimensional history. Human subject studies indicate that Kishuboard significantly enhances user productivity in various data science tasks.

Paper Structure

This paper contains 73 sections, 2 theorems, 16 figures, 5 tables.

Key Result

theorem 1

All commits $V_k \in \mathcal{U}$ are consistent.

Figures (16)

  • Figure 1: The motivation for code+data space versioning. Data scientists often wish to undo (only) executions while keeping the code (i.e., execution rollback; left to middle) for testing alternative methods (e.g., edits in the middle followed by executions in the right); Or they wish to completely revert all of their activities, jumping back to a certain point in the past (i.e., checkout; right to left). Our goal is to enable these various types of code/data state modifications through an intuitive user interface.
  • Figure 2: By adjusting two knobs (i.e., orange boxes for code and data), we can achieve (a) execution rollback and (b) code+data checkout. Execution rollback is equivalent to altering only the data while keeping the code the same. Checkout is equivalent to altering both code/data to exactly a point that existed in the past. In addition to proposing this two-dimensional versioning, we formalize consistent code/data states (§\ref{['sec:version']}), develop a working prototype (§\ref{['sec:design']} and §\ref{['sec:others']}), and evaluate its usefulness in data science tasks (§\ref{['sec:evalmethod']} and §\ref{['sec:eval']}).
  • Figure 3: Kishuboard user interface. The history graph (purple box) shows past commits with code+data tags, allowing users to quickly grasp the current state. The code and variable panes (yellow box) display the information of a selected commit (in the history graph). From any past commit, users can load data only (i.e., execution rollback) or load both code and data (i.e., checkout) using the navigation popup (red box). Then, the Code and Variable tags move appropriately.
  • Figure 4: Kishuboard's system architecture consisting of user interface, storage, and instrument highlighted.
  • Figure 5: Examples of navigation in two-dimensional and one-dimensional axis systems
  • ...and 11 more figures

Theorems & Definitions (2)

  • theorem 1
  • theorem 2