Table of Contents
Fetching ...

Analysis of Student-LLM Interaction in a Software Engineering Project

Agrawal Naman, Ridwan Shariffdeen, Guanlin Wang, Sanka Rasnayaka, Ganesh Neelakanta Iyer

TL;DR

This study addresses the gap in understanding how software engineering students engage with large language models in project-based learning. By analyzing 126 undergraduates over a 13-week SPA project, the authors compare ChatGPT and Copilot across code generation, prompting behavior, integration, and sentiment, revealing that ChatGPT supports iterative prompting to produce concise, maintainable code while Copilot often yields more complex outputs. The findings show AI usage declines as teams progress, code generated by LLMs constitutes a small but meaningful portion of the repository, and student prompts improve over time, indicating growing proficiency in human–AI collaboration. These insights inform educational design, recommending early AI integration, emphasis on prompt engineering, and training students to critically evaluate and refine AI-generated code to maximize learning and productivity in SE curricula.

Abstract

Large Language Models (LLMs) are becoming increasingly competent across various domains, educators are showing a growing interest in integrating these LLMs into the learning process. Especially in software engineering, LLMs have demonstrated qualitatively better capabilities in code summarization, code generation, and debugging. Despite various research on LLMs for software engineering tasks in practice, limited research captures the benefits of LLMs for pedagogical advancements and their impact on the student learning process. To this extent, we analyze 126 undergraduate students' interaction with an AI assistant during a 13-week semester to understand the benefits of AI for software engineering learning. We analyze the conversations, code generated, code utilized, and the human intervention levels to integrate the code into the code base. Our findings suggest that students prefer ChatGPT over CoPilot. Our analysis also finds that ChatGPT generates responses with lower computational complexity compared to CoPilot. Furthermore, conversational-based interaction helps improve the quality of the code generated compared to auto-generated code. Early adoption of LLMs in software engineering is crucial to remain competitive in the rapidly developing landscape. Hence, the next generation of software engineers must acquire the necessary skills to interact with AI to improve productivity.

Analysis of Student-LLM Interaction in a Software Engineering Project

TL;DR

This study addresses the gap in understanding how software engineering students engage with large language models in project-based learning. By analyzing 126 undergraduates over a 13-week SPA project, the authors compare ChatGPT and Copilot across code generation, prompting behavior, integration, and sentiment, revealing that ChatGPT supports iterative prompting to produce concise, maintainable code while Copilot often yields more complex outputs. The findings show AI usage declines as teams progress, code generated by LLMs constitutes a small but meaningful portion of the repository, and student prompts improve over time, indicating growing proficiency in human–AI collaboration. These insights inform educational design, recommending early AI integration, emphasis on prompt engineering, and training students to critically evaluate and refine AI-generated code to maximize learning and productivity in SE curricula.

Abstract

Large Language Models (LLMs) are becoming increasingly competent across various domains, educators are showing a growing interest in integrating these LLMs into the learning process. Especially in software engineering, LLMs have demonstrated qualitatively better capabilities in code summarization, code generation, and debugging. Despite various research on LLMs for software engineering tasks in practice, limited research captures the benefits of LLMs for pedagogical advancements and their impact on the student learning process. To this extent, we analyze 126 undergraduate students' interaction with an AI assistant during a 13-week semester to understand the benefits of AI for software engineering learning. We analyze the conversations, code generated, code utilized, and the human intervention levels to integrate the code into the code base. Our findings suggest that students prefer ChatGPT over CoPilot. Our analysis also finds that ChatGPT generates responses with lower computational complexity compared to CoPilot. Furthermore, conversational-based interaction helps improve the quality of the code generated compared to auto-generated code. Early adoption of LLMs in software engineering is crucial to remain competitive in the rapidly developing landscape. Hence, the next generation of software engineers must acquire the necessary skills to interact with AI to improve productivity.

Paper Structure

This paper contains 14 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Density Plot for measured key metrics
  • Figure 2: Comparison of ChatGPT and Copilot Complexity Across Various Complexity Measures
  • Figure 3: Variation of ChatGPT generated code in a conversation
  • Figure 4: Distribution of Difference Complexity Measures between Repo and GPT Code with Log Transformed x-axis
  • Figure 5: Similarity of generated and integrated code
  • ...and 3 more figures