Table of Contents
Fetching ...

Do AI assistants help students write formal specifications? A study with ChatGPT and the B-Method

Alfredo Capozucca, Daniil Yampolskyi, Alexander Goldberg, Maximiliano Cristiá

TL;DR

The study investigates whether OpenAI's ChatGPT can aid undergraduate students in writing formal B-method specifications. Employing a within-subject pretest–posttest design embedded in a formal methods course, the researchers compare performance with and without AI assistance across two iterations, analyzing correctness, trust, and interaction prompts. They find that ChatGPT does not reliably improve correctness and that students generally distrust the AI, with distrust sometimes associated with better performance; prompts that help identify components (e.g., state variables) show some potential, and a pattern of prompt usage may influence outcomes. The results suggest careful integration of AI in FM education, highlighting that student knowledge remains the primary determinant of correctness, while AI may serve as a supplementary aid; as AI capabilities evolve, educators should adapt assessment and teaching strategies accordingly.

Abstract

This paper investigates the role of AI assistants, specifically OpenAI's ChatGPT, in teaching formal methods (FM) to undergraduate students, using the B-method as a formal specification technique. While existing studies demonstrate the effectiveness of AI in coding tasks, no study reports on its impact on formal specifications. We examine whether ChatGPT provides an advantage when writing B-specifications and analyse student trust in its outputs. Our findings indicate that the AI does not help students to enhance the correctness of their specifications, with low trust correlating to better outcomes. Additionally, we identify a behavioural pattern with which to interact with ChatGPT which may influence the correctness of B-specifications.

Do AI assistants help students write formal specifications? A study with ChatGPT and the B-Method

TL;DR

The study investigates whether OpenAI's ChatGPT can aid undergraduate students in writing formal B-method specifications. Employing a within-subject pretest–posttest design embedded in a formal methods course, the researchers compare performance with and without AI assistance across two iterations, analyzing correctness, trust, and interaction prompts. They find that ChatGPT does not reliably improve correctness and that students generally distrust the AI, with distrust sometimes associated with better performance; prompts that help identify components (e.g., state variables) show some potential, and a pattern of prompt usage may influence outcomes. The results suggest careful integration of AI in FM education, highlighting that student knowledge remains the primary determinant of correctness, while AI may serve as a supplementary aid; as AI capabilities evolve, educators should adapt assessment and teaching strategies accordingly.

Abstract

This paper investigates the role of AI assistants, specifically OpenAI's ChatGPT, in teaching formal methods (FM) to undergraduate students, using the B-method as a formal specification technique. While existing studies demonstrate the effectiveness of AI in coding tasks, no study reports on its impact on formal specifications. We examine whether ChatGPT provides an advantage when writing B-specifications and analyse student trust in its outputs. Our findings indicate that the AI does not help students to enhance the correctness of their specifications, with low trust correlating to better outcomes. Additionally, we identify a behavioural pattern with which to interact with ChatGPT which may influence the correctness of B-specifications.

Paper Structure

This paper contains 23 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Correctness distribution of participants for each assessment criterion's dimension.
  • Figure 2: Distribution of participant's beliefs regarding the help offered by the AIA to achieve a B-specification and their actual achieved performance.
  • Figure 3: Participants' confidence in correctness (CiC) of their B-specification with respect to how much of their level of confidence is due to the help of the AIA. Size of the bubble is proportional to the number of same answers.
  • Figure 4: Distribution of participant's level of confidence in correctness (CiC) is due to the help of the AIA placed over the actual correctness of their provided B-specification.
  • Figure 5: Distribution of participant's level of confidence in correctness (CiC) is due to the help of the AIA placed over their normalised edit distance (NED).
  • ...and 3 more figures