Table of Contents
Fetching ...

NoteEx: Interactive Visual Context Manipulation for LLM-Assisted Exploratory Data Analysis in Computational Notebooks

Mohammad Hasan Payandeh, Lin-Ping Yuan, Jian Zhao

TL;DR

The paper addresses the challenge of selecting informative, task-relevant context for LLM-assisted Exploratory Data Analysis in computational notebooks, where mental models are often non-linear and difficult to capture in a 1D notebook. It introduces NoteEx, a JupyterLab extension that externalizes the analyst’s evolving mental model via a 2D Canvas View, surfaces cell metadata and data variable provenance, and provides an LLM-Assistant View for editable, context-aware prompt construction. Through a formative study and a 12-participant user study, the authors demonstrate that NoteEx improves mental-model maintenance, context selection accuracy, and perceived usefulness of LLM assistance, while increasing engagement and reducing prompt complexity relative to a Baseline. The work suggests design implications for human-in-the-loop context selection, 2D representations of notebooks, and future automation to infer and propose mental-model dependencies, with potential to reduce hallucinations and improve trust in LLM-powered data analysis tools.

Abstract

Computational notebooks have become popular for Exploratory Data Analysis (EDA), augmented by LLM-based code generation and result interpretation. Effective LLM assistance hinges on selecting informative context -- the minimal set of cells whose code, data, or outputs suffice to answer a prompt. As notebooks grow long and messy, users can lose track of the mental model of their analysis. They thus fail to curate appropriate contexts for LLM tasks, causing frustration and tedious prompt engineering. We conducted a formative study (n=6) that surfaced challenges in LLM context selection and mental model maintenance. Therefore, we introduce NoteEx, a JupyterLab extension that provides a semantic visualization of the EDA workflow, allowing analysts to externalize their mental model, specify analysis dependencies, and enable interactive selection of task-relevant contexts for LLMs. A user study (n=12) against a baseline shows that NoteEx improved mental model retention and context selection, leading to more accurate and relevant LLM responses.

NoteEx: Interactive Visual Context Manipulation for LLM-Assisted Exploratory Data Analysis in Computational Notebooks

TL;DR

The paper addresses the challenge of selecting informative, task-relevant context for LLM-assisted Exploratory Data Analysis in computational notebooks, where mental models are often non-linear and difficult to capture in a 1D notebook. It introduces NoteEx, a JupyterLab extension that externalizes the analyst’s evolving mental model via a 2D Canvas View, surfaces cell metadata and data variable provenance, and provides an LLM-Assistant View for editable, context-aware prompt construction. Through a formative study and a 12-participant user study, the authors demonstrate that NoteEx improves mental-model maintenance, context selection accuracy, and perceived usefulness of LLM assistance, while increasing engagement and reducing prompt complexity relative to a Baseline. The work suggests design implications for human-in-the-loop context selection, 2D representations of notebooks, and future automation to infer and propose mental-model dependencies, with potential to reduce hallucinations and improve trust in LLM-powered data analysis tools.

Abstract

Computational notebooks have become popular for Exploratory Data Analysis (EDA), augmented by LLM-based code generation and result interpretation. Effective LLM assistance hinges on selecting informative context -- the minimal set of cells whose code, data, or outputs suffice to answer a prompt. As notebooks grow long and messy, users can lose track of the mental model of their analysis. They thus fail to curate appropriate contexts for LLM tasks, causing frustration and tedious prompt engineering. We conducted a formative study (n=6) that surfaced challenges in LLM context selection and mental model maintenance. Therefore, we introduce NoteEx, a JupyterLab extension that provides a semantic visualization of the EDA workflow, allowing analysts to externalize their mental model, specify analysis dependencies, and enable interactive selection of task-relevant contexts for LLMs. A user study (n=12) against a baseline shows that NoteEx improved mental model retention and context selection, leading to more accurate and relevant LLM responses.

Paper Structure

This paper contains 42 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: NoteEx augments the traditional JupyterLab environment with four key elements for LLM-assisted EDA. (A) The Notebook View (B) The Canvas View (C, C1) The Data Information View (D) The LLM Assistant View
  • Figure 2: Key steps related to the usage scenario (Section \ref{['usage_scenario']}), demonstrating how Sarah uses NoteEx to perform her EDA tasks.
  • Figure 3: Key steps related to the usage scenario (Section \ref{['usage_scenario']}), demonstrating how Sarah uses NoteEx to perform her EDA tasks.
  • Figure 4: This figure compares interaction logs for Task #1 (top) and Task #2 (bottom), and for each interface (Baseline or NoteEx) by merging all participants’ interactions into single timelines.
  • Figure 5: Comparison of overall user engagement metrics (UES-SF) between Baseline and NoteEx.
  • ...and 3 more figures