What makes an Expert? Comparing Problem-solving Practices in Data Science Notebooks

Manuel Valle Torre; Marcus Specht; Catharine Oertel

What makes an Expert? Comparing Problem-solving Practices in Data Science Notebooks

Manuel Valle Torre, Marcus Specht, Catharine Oertel

TL;DR

This study tackles how data science experts and novices differ in their problem-solving processes by applying a multi-level sequence analysis to 440 Jupyter notebooks from Code4ML. By mapping 11 data science phases to five problem-solving practices and analyzing at three granular levels (overall structure, phase transitions, and fine-grained actions), it finds that expertise is not a matter of following a different sequence but of enacting the process more efficiently and iteratively. Novices tend to pursue longer, linear workflows with exploratory action patterns, whereas experts adopt shorter, iterative cycles driven by high-impact sequences like model-related actions. The work has practical implications for data science education, suggesting curricula and intelligent support tools should cultivate flexible, reflective problem-solving and use AI as a partner for iterative thinking rather than a shortcut, thereby enhancing human-AI collaboration in learning.

Abstract

The development of data science expertise requires tacit, process-oriented skills that are difficult to teach directly. This study addresses the resulting challenge of empirically understanding how the problem-solving processes of experts and novices differ. We apply a multi-level sequence analysis to 440 Jupyter notebooks from a public dataset, mapping low-level coding actions to higher-level problem-solving practices. Our findings reveal that experts do not follow fundamentally different transitions between data science phases than novices (e.g., Data Import, EDA, Model Training, Visualization). Instead, expertise is distinguished by the overall workflow structure from a problem-solving perspective and cell-level, fine-grained action patterns. Novices tend to follow long, linear processes, whereas experts employ shorter, more iterative strategies enacted through efficient, context-specific action sequences. These results provide data science educators with empirical insights for curriculum design and assessment, shifting the focus from final products toward the development of the flexible, iterative thinking that defines expertise-a priority in a field increasingly shaped by AI tools.

What makes an Expert? Comparing Problem-solving Practices in Data Science Notebooks

TL;DR

Abstract

Paper Structure (16 sections, 5 figures, 2 tables)

This paper contains 16 sections, 5 figures, 2 tables.

Introduction
Related Work
Analyzing the Data Science Process
A Multi-Level Framework for Data Science Process Analysis
Dataset and Methods
Dataset
Methodology
Results
Exploratory Analysis and Overall Process Structure
Phase-Level Transitions: Process Mining and Markov Models
Action Patterns Within Problem-Solving Stages
Discussion and Implications
Interpreting Strategies Across Different Granularities
Implications for Data Science Education
Limitations and Future Work
...and 1 more sections

Figures (5)

Figure 1: Visualization of all Problem-solving Sequences
Figure 2: Relative Frequency visualization of Novice and Expert Sequences
Figure 3: Relative Frequency visualization of Clustered Sequences
Figure 4: Process Visualisation for Experts
Figure 6: Transition Probabilities of Sequences by Cluster

What makes an Expert? Comparing Problem-solving Practices in Data Science Notebooks

TL;DR

Abstract

What makes an Expert? Comparing Problem-solving Practices in Data Science Notebooks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)