When Are Reactive Notebooks Not Reactive?
Megan Zheng, Will Crichton, Akshay Narayan, Deepti Raghavan, Nikos Vasilakis
TL;DR
Computational notebooks lack consistent reactivity when edits occur during execution. The authors introduce Rex, a fine-grained micro-benchmark with formal definitions of execution order, consistency, and soundness to evaluate reactive notebook systems. Evaluating Rex on Marimo, Observable, and IPyflow (plus baselines) reveals that direct assignments are reliably handled, but reassignment, mutations, and external-state interactions frequently cause undefined or incorrect reactivity, with IPyflow performing best among live systems but still missing edge cases. The work argues for clearer guarantees and tooling, and positions Rex as a practical aid for researchers and developers to improve reactive notebook implementations and user understanding.
Abstract
Computational notebooks are convenient for programmers, but can easily become confusing and inconsistent due to the ability to incrementally edit a program that is running. Recent reactive notebook systems, such as Ipyflow, Marimo and Observable, strive to keep notebook state in sync with the current cell code by re-executing a minimal set of cells upon modification. However, each system defines reactivity a different way. Additionally, within any definition, we find simple notebook modifications that can break each system. Overall, these inconsistencies make it difficult for users to construct a mental model of their reactive notebook's implementation. This paper proposes Rex, a fine-grained test suite to discuss and assess reactivity capabilities within reactive notebook systems. We evaluate Rex on three existing reactive notebook systems and classify their failures with the aims of (i) helping programmers understand when reactivity fails and (ii) helping notebook implementations improve.
