Table of Contents
Fetching ...

Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks

Arumoy Shome, Luis Cruz, Diomidis Spinellis, Arie van Deursen

TL;DR

This work examines how feedback mechanisms in Jupyter notebooks influence the ML development lifecycle. By mining 297.8k Python notebooks and analyzing 2.3 million code cells, the authors identify explicit assertions and implicit print/last-cell feedback as the main signals, and provide a public dataset of 89.6k assertions, 1.4M print statements, and 1M last-cell statements. They conduct case studies that show assertions can automate validation of critical ML pipeline assumptions, while implicit feedback dominates decision-making during data prep, modeling, and visualization, revealing gaps in testing practices and potential for technical debt. The study advocates for better documentation and ML-specific testing tools to improve reproducibility and robustness of notebook-driven ML workflows. Practically, it offers concrete architectural checks (data shape, type, leakage, etc.) and highlights where tooling and education can advance reliable, maintainable ML development in notebook environments.

Abstract

The machine learning development lifecycle is characterized by iterative and exploratory processes that rely on feedback mechanisms to ensure data and model integrity. Despite the critical role of feedback in machine learning engineering, no prior research has been conducted to identify and understand these mechanisms. To address this knowledge gap, we mine 297.8 thousand Jupyter notebooks and analyse 2.3 million code cells. We identify three key feedback mechanisms -- assertions, print statements and last cell statements -- and further categorize them into implicit and explicit forms of feedback. Our findings reveal extensive use of implicit feedback for critical design decisions and the relatively limited adoption of explicit feedback mechanisms. By conducting detailed case studies with selected feedback instances, we uncover the potential for automated validation of critical assumptions in ML workflows using assertions. Finally, this study underscores the need for improved documentation, and provides practical recommendations on how existing feedback mechanisms in the ML development workflow can be effectively used to mitigate technical debt and enhance reproducibility.

Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks

TL;DR

This work examines how feedback mechanisms in Jupyter notebooks influence the ML development lifecycle. By mining 297.8k Python notebooks and analyzing 2.3 million code cells, the authors identify explicit assertions and implicit print/last-cell feedback as the main signals, and provide a public dataset of 89.6k assertions, 1.4M print statements, and 1M last-cell statements. They conduct case studies that show assertions can automate validation of critical ML pipeline assumptions, while implicit feedback dominates decision-making during data prep, modeling, and visualization, revealing gaps in testing practices and potential for technical debt. The study advocates for better documentation and ML-specific testing tools to improve reproducibility and robustness of notebook-driven ML workflows. Practically, it offers concrete architectural checks (data shape, type, leakage, etc.) and highlights where tooling and education can advance reliable, maintainable ML development in notebook environments.

Abstract

The machine learning development lifecycle is characterized by iterative and exploratory processes that rely on feedback mechanisms to ensure data and model integrity. Despite the critical role of feedback in machine learning engineering, no prior research has been conducted to identify and understand these mechanisms. To address this knowledge gap, we mine 297.8 thousand Jupyter notebooks and analyse 2.3 million code cells. We identify three key feedback mechanisms -- assertions, print statements and last cell statements -- and further categorize them into implicit and explicit forms of feedback. Our findings reveal extensive use of implicit feedback for critical design decisions and the relatively limited adoption of explicit feedback mechanisms. By conducting detailed case studies with selected feedback instances, we uncover the potential for automated validation of critical assumptions in ML workflows using assertions. Finally, this study underscores the need for improved documentation, and provides practical recommendations on how existing feedback mechanisms in the ML development workflow can be effectively used to mitigate technical debt and enhance reproducibility.
Paper Structure (48 sections, 18 figures, 2 tables)

This paper contains 48 sections, 18 figures, 2 tables.

Figures (18)

  • Figure 1: Example Jupyter notebook with code and markdown cells, adapted from pimentel2019large-scale.
  • Figure 2: Overview of data collection methodology used in this study
  • Figure 3: Overview of various feedback mechanism groups identified in this study.
  • Figure 4: 10 most common methods used in assertions written using external testing libraries.
  • Figure 5: 10 most common AST nodes in the test attribute of statements.
  • ...and 13 more figures