Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

Michael Tompkins; Nihaarika Agarwal; Ananta Soneji; Robert Wasinger; Connor Nelson; Kevin Leach; Rakibul Hasan; Adam Doupé; Daniel Votipka; Yan Shoshitaishvili; Jaron Mink

Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

Michael Tompkins, Nihaarika Agarwal, Ananta Soneji, Robert Wasinger, Connor Nelson, Kevin Leach, Rakibul Hasan, Adam Doupé, Daniel Votipka, Yan Shoshitaishvili, Jaron Mink

TL;DR

This study provides the first large-scale, in-situ evaluation of an AI tutor in hands-on cybersecurity education, analyzing 142,526 tutor queries from 309 students across 396 Capture-the-Flag-like challenges. Using a context-aware tutor (SENSAI) integrated into a 15-week course, the authors identify three conversation styles—Short, Reactive, and Proactive—and demonstrate that these styles differentially predict challenge completion, with Short and Proactive often outperforming Reactive as material grows more complex. The work combines qualitative coding with MHMM-based style discovery and mixed-effects models to link interaction patterns to outcomes, while also surveying student perceptions about usefulness, trust, and comparisons to human TAs. The findings offer concrete implications for deploying AI tutors as complements to human instructors, improving tutor design, and guiding student interaction strategies in cybersecurity education.

Abstract

To meet the ever-increasing demands of the cybersecurity workforce, AI tutors have been proposed for personalized, scalable education. But, while AI tutors have shown promise in introductory programming courses, no work has evaluated their use in hands-on exploration and exploitation of systems (e.g., ``capture-the-flag'') commonly used to teach cybersecurity. Thus, despite growing interest and need, no work has evaluated how students use AI tutors or whether they benefit from their presence in real, large-scale cybersecurity courses. To answer this, we conducted a semester-long observational study on the use of an embedded AI tutor with 309 students in an upper-division introductory cybersecurity course. By analyzing 142,526 student queries sent to the AI tutor across 396 cybersecurity challenges spanning 9 core cybersecurity topics and an accompanying set of post-semester surveys, we find (1) what queries and conversational strategies students use with AI tutors, (2) how these strategies correlate with challenge completion, and (3) students' perceptions of AI tutors in cybersecurity education. In particular, we identify three broad AI tutor conversational styles among users: Short (bounded, few-turn exchanges), Reactive (repeatedly submitting code and errors), and Proactive (driving problem-solving through targeted inquiry). We also find that the use of these styles significantly predicts challenge completion, and that this effect increases as materials become more advanced. Furthermore, students valued the tutor's availability but reported that it became less useful for harder material. Based on this, we provide suggestions for security educators and developers on practical AI tutor use.

Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

TL;DR

Abstract

Paper Structure (35 sections, 7 figures, 12 tables)

This paper contains 35 sections, 7 figures, 12 tables.

Introduction
Background & Related Work
Methodology
Observation Data
Qualitative Data Analysis.
Quantitative Data Analysis
Eligibility, Recruitment, and Consent
Limitations
Participants
How Do Students Use AI-Tutors? (\ref{['rq:interaction']})
Queries
Short Conversations
Long Conversations
Performance Across Conversations (\ref{['rq:performance']})
Summary Statistics
...and 20 more sections

Figures (7)

Figure 1: AI Tutor Overview -- Participants solve challenges inside an instrumented container, capturing their active terminal and file context. When a student interacts with the AI Tutor (❶), the system bundles up the AI Tutor's system prompt, the participant's conversation history and query, and the captured context from the container (❷). This gets sent to the LLM (❸) which thinks and responds to the participant.
Figure 2: Observed Conversation Styles -- The conversation structure for $c$=11,692 query sequences. Columns represent hidden states (S1-S5); rows show emission probabilities for each query family, init/self box indicates initial and self-transition probabilities. Arrows only show inter-state transitions with $\geq$5%, and states only show emission probabilities with $\geq$10%.
Figure 3: Module Completion by Conversation Style -- Observed completion rates by module and style.
Figure 4: Perceived Utility of AI Tutors -- Participants' Likert-ratings of the AI Tutor's usefulness across tasks (\ref{['q:utility-start']}--\ref{['q:utility-confusion']}).
Figure 5: AI Tutor vs. TA Comparison -- Participants' ratings comparing the AI tutor to human TAs (\ref{['q:tacompare-start']}--\ref{['q:tacompare-lost']}).
...and 2 more figures

Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

TL;DR

Abstract

Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors

Authors

TL;DR

Abstract

Table of Contents

Figures (7)