Table of Contents
Fetching ...

How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, Anita Sarma

TL;DR

This study evaluates through a between-subjects study the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks, and finds significantly increased frustration levels.

Abstract

Conversational Generative AI (convo-genAI) is revolutionizing Software Engineering (SE) as engineers and academics embrace this technology in their work. However, there is a gap in understanding the current potential and pitfalls of this technology, specifically in supporting students in SE tasks. In this work, we evaluate through a between-subjects study (N=22) the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks. Our study did not find statistical differences in participants' productivity or self-efficacy when using ChatGPT as compared to traditional resources, but we found significantly increased frustration levels. Our study also revealed 5 distinct faults arising from violations of Human-AI interaction guidelines, which led to 7 different (negative) consequences on participants.

How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

TL;DR

This study evaluates through a between-subjects study the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks, and finds significantly increased frustration levels.

Abstract

Conversational Generative AI (convo-genAI) is revolutionizing Software Engineering (SE) as engineers and academics embrace this technology in their work. However, there is a gap in understanding the current potential and pitfalls of this technology, specifically in supporting students in SE tasks. In this work, we evaluate through a between-subjects study (N=22) the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks. Our study did not find statistical differences in participants' productivity or self-efficacy when using ChatGPT as compared to traditional resources, but we found significantly increased frustration levels. Our study also revealed 5 distinct faults arising from violations of Human-AI interaction guidelines, which led to 7 different (negative) consequences on participants.
Paper Structure (17 sections, 5 figures, 4 tables)

This paper contains 17 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of the research design
  • Figure 2: Self-efficacy results (box plots) per question. Medians are highlighted using black dots.
  • Figure 3: Continuance intention towards using ChatGPT (%)
  • Figure 4: Human-AI Interaction guideline violations reported by participants; those found by more than 50% are in bold.
  • Figure 5: The (a) causes and (b) consequences of ChatGPT's faults: Violation of Human-AI Interaction guidelines (G1, G2, ...) led to faults (F1, F2, ...). Faults had a cascading effect: one led to another and further led to consequences (C1, C2, ...) for participants. Some of these consequences led to other consequences.