How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

Rudrajit Choudhuri; Dylan Liu; Igor Steinmacher; Marco Gerosa; Anita Sarma

How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, Anita Sarma

TL;DR

This study evaluates through a between-subjects study the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks, and finds significantly increased frustration levels.

Abstract

Conversational Generative AI (convo-genAI) is revolutionizing Software Engineering (SE) as engineers and academics embrace this technology in their work. However, there is a gap in understanding the current potential and pitfalls of this technology, specifically in supporting students in SE tasks. In this work, we evaluate through a between-subjects study (N=22) the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks. Our study did not find statistical differences in participants' productivity or self-efficacy when using ChatGPT as compared to traditional resources, but we found significantly increased frustration levels. Our study also revealed 5 distinct faults arising from violations of Human-AI interaction guidelines, which led to 7 different (negative) consequences on participants.

How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

TL;DR

This study evaluates through a between-subjects study the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks, and finds significantly increased frustration levels.

Abstract

Paper Structure (17 sections, 5 figures, 4 tables)

This paper contains 17 sections, 5 figures, 4 tables.

Introduction
Method
Task Design
RQs, Metrics and Instruments
RQ1: How effective is convo-genAI in helping students in software engineering tasks?
RQ2: What are the current pitfalls in convo-genAI?
Sandboxing
Lab Study
Results
RQ1: Effectiveness
RQ2: Pitfalls
Faults made by ChatGPT
Causes of these faults and their consequences
Discussion: Recommendation
Related Work
...and 2 more sections

Figures (5)

Figure 1: Overview of the research design
Figure 2: Self-efficacy results (box plots) per question. Medians are highlighted using black dots.
Figure 3: Continuance intention towards using ChatGPT (%)
Figure 4: Human-AI Interaction guideline violations reported by participants; those found by more than 50% are in bold.
Figure 5: The (a) causes and (b) consequences of ChatGPT's faults: Violation of Human-AI Interaction guidelines (G1, G2, ...) led to faults (F1, F2, ...). Faults had a cascading effect: one led to another and further led to consequences (C1, C2, ...) for participants. Some of these consequences led to other consequences.

How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

TL;DR

Abstract

How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering

Authors

TL;DR

Abstract

Table of Contents

Figures (5)