Kishu: Time-Traveling for Computational Notebooks
Zhaoheng Li, Supawit Chockchowwat, Ribhav Sahu, Areet Sheth, Yongjoo Park
TL;DR
Kishu tackles the problem of time-travel in computational notebooks by introducing an application-level, delta-driven approach that records state evolution at a novel Co-variable granularity. It employs live namespace patching, a Delta Detector, and a Checkpoint Graph to capture per-cell state deltas and enable incremental checkout in sub-second time, while preserving inter-variable dependencies. The system supports robust restoration through fallback recomputation when serialization fails or data are unserializable, and demonstrates compatibility with 146 data-science libraries, achieving up to 4.55x smaller checkpoint sizes and up to 9.02x faster checkouts compared to baselines. Empirical results show generalized time-traveling, low delta-detection overhead, and strong performance across diverse notebooks, making practical, fault-tolerant path exploration and undo feasible within a single kernel. The work offers significant practical impact by enabling efficient experimentation, debugging, and exploratory workflows in data science notebooks.
Abstract
Computational notebooks (e.g., Jupyter, Google Colab) are widely used by data scientists. A key feature of notebooks is the interactive computing model of iteratively executing cells (i.e., a set of statements) and observing the result (e.g., model or plot). Unfortunately, existing notebook systems do not offer time-traveling to past states: when the user executes a cell, the notebook session state consisting of user-defined variables can be irreversibly modified - e.g., the user cannot 'un-drop' a dataframe column. This is because, unlike DBMS, existing notebook systems do not keep track of the session state. Existing techniques for checkpointing and restoring session states, such as OS-level memory snapshot or application-level session dump, are insufficient: checkpointing can incur prohibitive storage costs and may fail, while restoration can only be inefficiently performed from scratch by fully loading checkpoint files. In this paper, we introduce a new notebook system, Kishu, that offers time-traveling to and from arbitrary notebook states using an efficient and fault-tolerant incremental checkpoint and checkout mechanism. Kishu creates incremental checkpoints that are small and correctly preserve complex inter-variable dependencies at a novel Co-variable granularity. Then, to return to a previous state, Kishu accurately identifies the state difference between the current and target states to perform incremental checkout at sub-second latency with minimal data loading. Kishu is compatible with 146 object classes from popular data science libraries (e.g., Ray, Spark, PyTorch), and reduces checkpoint size and checkout time by up to 4.55x and 9.02x, respectively, on a variety of notebooks.
