Table of Contents
Fetching ...

Teaching Students to Question the Machine: An AI Literacy Intervention Improves Students' Regulation of LLM Use in a Science Task

O. Clerc, R. Abdelghani, C. Desvaux, E. Poisson, P. Y. Oudeyer, H. Sauzéon

Abstract

The rapid adoption of generative artificial intelligence (GenAI) in schools raises concerns about students' uncritical reliance on its outputs. Effective use of large language models (LLMs) requires not only technical knowledge but also the ability to monitor, evaluate, and regulate one's interaction with the system, processes closely tied to metacognitive regulation. These skills are still developing in middle school, making students particularly vulnerable to over-trust and premature acceptance of AI outputs. Because classroom time and teacher training resources are constrained, there is a pressing need to develop and evaluate AI literacy interventions that can be implemented under realistic school conditions. We report a controlled classroom study examining whether a two-hour AI literacy workshop improves students' interaction strategies and quality of final answers in LLM-supported science problem solving. A total of 116 students (grades 8-9; ages 13-15) completed six science investigation tasks using a generative AI system. Two days prior, the intervention group attended the workshop, which combined information about how LLMs work and fail with practical guidance on prompting and response evaluation; the control group received no training. Trained students showed less uncritical reliance on the system: they more often reformulated queries, asked follow-up questions, and more accurately judged response correctness, leading to better performance. In contrast, GenAI and metacognitive self-report scores did not predict performance, suggesting that effective use of generative AI depends less on self-reported measures and more on explicit training in interaction regulation. Overall, the results show that brief, scalable AI literacy instruction can meaningfully improve how middle-school students use generative AI in school-like learning activities.

Teaching Students to Question the Machine: An AI Literacy Intervention Improves Students' Regulation of LLM Use in a Science Task

Abstract

The rapid adoption of generative artificial intelligence (GenAI) in schools raises concerns about students' uncritical reliance on its outputs. Effective use of large language models (LLMs) requires not only technical knowledge but also the ability to monitor, evaluate, and regulate one's interaction with the system, processes closely tied to metacognitive regulation. These skills are still developing in middle school, making students particularly vulnerable to over-trust and premature acceptance of AI outputs. Because classroom time and teacher training resources are constrained, there is a pressing need to develop and evaluate AI literacy interventions that can be implemented under realistic school conditions. We report a controlled classroom study examining whether a two-hour AI literacy workshop improves students' interaction strategies and quality of final answers in LLM-supported science problem solving. A total of 116 students (grades 8-9; ages 13-15) completed six science investigation tasks using a generative AI system. Two days prior, the intervention group attended the workshop, which combined information about how LLMs work and fail with practical guidance on prompting and response evaluation; the control group received no training. Trained students showed less uncritical reliance on the system: they more often reformulated queries, asked follow-up questions, and more accurately judged response correctness, leading to better performance. In contrast, GenAI and metacognitive self-report scores did not predict performance, suggesting that effective use of generative AI depends less on self-reported measures and more on explicit training in interaction regulation. Overall, the results show that brief, scalable AI literacy instruction can meaningfully improve how middle-school students use generative AI in school-like learning activities.

Paper Structure

This paper contains 22 sections, 4 figures.

Figures (4)

  • Figure 1: Graphical abstract Study design and main findings. Grade 8--9 students (N=116) completed six science inquiry tasks with a GenAI system; the intervention group (n=76) received a two-hour AI literacy workshop (conceptual + procedural) two days before the task, whereas the control group did not (n=40). Trained students achieved higher final-answer scores and showed stronger interaction regulation (more rejection of underspecified prompts, more follow-up questions when needed, and more accurate correctness judgments), while neither GenAI nor metacognitive self-report scores predicted performance.
  • Figure 2: Student--LLM workflow and exercise structure.(A) For each exercise, students rated confidence (3-point), decided whether to use the suggested prompt, iteratively queried and evaluated the LLM (3-point; optional follow-up), and wrote a final answer; repeated across six exercises. (B) Example sheet and prompt manipulation. The suggested prompt was either well-specified or underspecified (one shown per exercise).
  • Figure 3: Workshop effects on performance and prompt acceptance.(A) Final-answer performance (0--20) by group. Intervention students achieved higher final-answer scores. (B) Relationship between performance and prompt discrimination sensitivity ($d'$). Performance increased with prompt discrimination. (C) Predictors of accepting the suggested prompt (GLM).
  • Figure 4: Effects of the intervention on prompt replacement, follow-up behavior, and correctness judgments.(A) Within-student performance by accept vs. reject on well-specified prompts. Rejecting well-specified prompts was more costly in the control group. (B) Probability of judging an AI answer as correct (GLM; 95% CI) by prompt specificity. Trained students made more accurate correctness judgments. (C) Follow-up questioning after accepted underspecified prompts by group. Trained students asked more follow-up questions after underspecified prompts.