Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors
Michael Tompkins, Nihaarika Agarwal, Ananta Soneji, Robert Wasinger, Connor Nelson, Kevin Leach, Rakibul Hasan, Adam Doupé, Daniel Votipka, Yan Shoshitaishvili, Jaron Mink
TL;DR
This study provides the first large-scale, in-situ evaluation of an AI tutor in hands-on cybersecurity education, analyzing 142,526 tutor queries from 309 students across 396 Capture-the-Flag-like challenges. Using a context-aware tutor (SENSAI) integrated into a 15-week course, the authors identify three conversation styles—Short, Reactive, and Proactive—and demonstrate that these styles differentially predict challenge completion, with Short and Proactive often outperforming Reactive as material grows more complex. The work combines qualitative coding with MHMM-based style discovery and mixed-effects models to link interaction patterns to outcomes, while also surveying student perceptions about usefulness, trust, and comparisons to human TAs. The findings offer concrete implications for deploying AI tutors as complements to human instructors, improving tutor design, and guiding student interaction strategies in cybersecurity education.
Abstract
To meet the ever-increasing demands of the cybersecurity workforce, AI tutors have been proposed for personalized, scalable education. But, while AI tutors have shown promise in introductory programming courses, no work has evaluated their use in hands-on exploration and exploitation of systems (e.g., ``capture-the-flag'') commonly used to teach cybersecurity. Thus, despite growing interest and need, no work has evaluated how students use AI tutors or whether they benefit from their presence in real, large-scale cybersecurity courses. To answer this, we conducted a semester-long observational study on the use of an embedded AI tutor with 309 students in an upper-division introductory cybersecurity course. By analyzing 142,526 student queries sent to the AI tutor across 396 cybersecurity challenges spanning 9 core cybersecurity topics and an accompanying set of post-semester surveys, we find (1) what queries and conversational strategies students use with AI tutors, (2) how these strategies correlate with challenge completion, and (3) students' perceptions of AI tutors in cybersecurity education. In particular, we identify three broad AI tutor conversational styles among users: Short (bounded, few-turn exchanges), Reactive (repeatedly submitting code and errors), and Proactive (driving problem-solving through targeted inquiry). We also find that the use of these styles significantly predicts challenge completion, and that this effect increases as materials become more advanced. Furthermore, students valued the tutor's availability but reported that it became less useful for harder material. Based on this, we provide suggestions for security educators and developers on practical AI tutor use.
