Table of Contents
Fetching ...

Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications for AI Literacy in Programmatic Data Science

Qianou Ma, Kenneth Koedinger, Tongshuang Wu

TL;DR

The paper investigates whether large language models (LLMs) democratize programmatic data science and finds that experience matters: under time pressure LLMs can close performance gaps for less-experienced students, but with more time the technical background still predicts success. Using a rich, mixed-method classroom study with logs, surveys, and think-aloud data, the authors develop an LLM-assisted log annotation codebook to characterize AI-use behaviors across episodes and four knowledge dimensions. They show that technically experienced students use AI more strategically (clear prompts, planning, and explanation), while novices rely on AI for immediate debugging; demonstrations and longer task time improve some AI-use skills, but evaluative skills require targeted training. The work contributes both empirical insights into AI literacy as a set of transferable competencies and practical guidance for curricula and tool design to support durable, effective human–AI collaboration in data science. Overall, the study highlights that successful AI-enabled data analysis hinges on structured training that fosters metacognitive, conceptual, procedural, and dispositional AI-use skills, not merely surface familiarity with AI tools.

Abstract

LLMs promise to democratize technical work in complex domains like programmatic data analysis, but not everyone benefits equally. We study how students with varied experiences use LLMs to complete Python-based data analysis in computational notebooks in a graduate course. Drawing on homework logs, recordings, and surveys from 36 students, we ask: Which experience matters most, and how does it shape AI use? Our mixed-methods analysis shows that technical experience -- not AI familiarity or communication skills -- remains a significant predictor of success. Students also vary widely in how they leverage LLMs, struggling at stages of forming intent, expressing inputs, interpreting outputs, and assessing results. We identify success and failure behaviors, such as providing context or decomposing prompts, that distinguish effective use. These findings inform AI literacy interventions, highlighting that lightweight demonstrations improve surface fluency but are insufficient; deeper training and scaffolds are needed to cultivate resilient AI use skills.

Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications for AI Literacy in Programmatic Data Science

TL;DR

The paper investigates whether large language models (LLMs) democratize programmatic data science and finds that experience matters: under time pressure LLMs can close performance gaps for less-experienced students, but with more time the technical background still predicts success. Using a rich, mixed-method classroom study with logs, surveys, and think-aloud data, the authors develop an LLM-assisted log annotation codebook to characterize AI-use behaviors across episodes and four knowledge dimensions. They show that technically experienced students use AI more strategically (clear prompts, planning, and explanation), while novices rely on AI for immediate debugging; demonstrations and longer task time improve some AI-use skills, but evaluative skills require targeted training. The work contributes both empirical insights into AI literacy as a set of transferable competencies and practical guidance for curricula and tool design to support durable, effective human–AI collaboration in data science. Overall, the study highlights that successful AI-enabled data analysis hinges on structured training that fosters metacognitive, conceptual, procedural, and dispositional AI-use skills, not merely surface familiarity with AI tools.

Abstract

LLMs promise to democratize technical work in complex domains like programmatic data analysis, but not everyone benefits equally. We study how students with varied experiences use LLMs to complete Python-based data analysis in computational notebooks in a graduate course. Drawing on homework logs, recordings, and surveys from 36 students, we ask: Which experience matters most, and how does it shape AI use? Our mixed-methods analysis shows that technical experience -- not AI familiarity or communication skills -- remains a significant predictor of success. Students also vary widely in how they leverage LLMs, struggling at stages of forming intent, expressing inputs, interpreting outputs, and assessing results. We identify success and failure behaviors, such as providing context or decomposing prompts, that distinguish effective use. These findings inform AI literacy interventions, highlighting that lightweight demonstrations improve surface fluency but are insufficient; deeper training and scaffolds are needed to cultivate resilient AI use skills.

Paper Structure

This paper contains 54 sections, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Google Colab notebook environment with embedded Gemini assistant. Students could interact with Gemini in two main ways: code cell generations (generateCode) and a conversational side chatbot (converse). There are also additional buttons that support error explanation, code explanation, and visualization generation.
  • Figure 2: Distribution of self-rated technical experience, LLM experience, and communication experience scores.
  • Figure 3: (A) Example raw log event format. (B) Example annotated log format. The student used AI in two actions (pressing a get visualizations button in Colab and entering a chat prompt "How do I clean my dataset"), reflecting AI usage ai_suggest_steps_or_plan and ai_explain_concepts and indicating the intent of the first data cleaning episode.
  • Figure 4: Distribution of the AI usage (both $used$ and $missed$) aggregated by code type and episode steps. Bar length represents the number of steps involving this AI use, while the percentages denote the proportion of $used$ vs. $missed$ opportunities. 100% means that all AI usage is $used$ (if blue) or $missed$ opportunities (if grey).
  • Figure 5: (A) While experienced students and novices used AI with similar frequencies, experienced students might be more able to resolve challenges without using AI. (B) Experienced students might strategically distribute effort to ask AI to plan and improve or explain, different from novices.
  • ...and 5 more figures