Table of Contents
Fetching ...

Interleaving Natural Language Prompting with Code Editing for Solving Programming Tasks with Generative AI Models

Victor-Alexandru Pădurean, Alkis Gotovos, Ahana Ghosh, Paul Denny, Juho Leinonen, Andrew Luxton-Reilly, James Prather, Adish Singla

TL;DR

The paper addresses how students navigate GenAI-assisted programming by interleaving natural-language prompts with manual code edits. It analyzes 13,305 interactions from 355 students across a three-day lab on a GPT-4o-based platform with a $20$-message limit per conversation, across 9 problems. The study finds that prompts typically seed initial solutions while edits follow failed runs, with edits being concise and task-dependent; higher programming competence reduces reliance on edits. These insights inform educational design to cultivate both prompting fluency and precise editing skills, and to scaffold the interplay between prompt refinement and targeted code repair in AI-assisted programming workflows.

Abstract

Modern computing students often rely on both natural-language prompting and manual code editing to solve programming tasks. Yet we still lack a clear understanding of how these two modes are combined in practice, and how their usage varies with task complexity and student ability. In this paper, we investigate this through a large-scale study in an introductory programming course, collecting 13,305 interactions from 355 students during a three-day lab activity. Our analysis shows that students primarily use prompting to generate initial solutions, and then often enter short edit-run loops to refine their code following a failed execution. Student reflections confirm that prompting is helpful for structuring solutions, editing is effective for making targeted corrections, while both are useful for learning. We find that manual editing becomes more frequent as task complexity increases, but most edits remain concise, with many affecting a single line of code. Higher-performing students tend to succeed using prompting alone, while lower-performing students rely more on edits. These findings highlight the role of manual editing as a deliberate last-mile repair strategy, complementing prompting in AI-assisted programming workflows.

Interleaving Natural Language Prompting with Code Editing for Solving Programming Tasks with Generative AI Models

TL;DR

The paper addresses how students navigate GenAI-assisted programming by interleaving natural-language prompts with manual code edits. It analyzes 13,305 interactions from 355 students across a three-day lab on a GPT-4o-based platform with a -message limit per conversation, across 9 problems. The study finds that prompts typically seed initial solutions while edits follow failed runs, with edits being concise and task-dependent; higher programming competence reduces reliance on edits. These insights inform educational design to cultivate both prompting fluency and precise editing skills, and to scaffold the interplay between prompt refinement and targeted code repair in AI-assisted programming workflows.

Abstract

Modern computing students often rely on both natural-language prompting and manual code editing to solve programming tasks. Yet we still lack a clear understanding of how these two modes are combined in practice, and how their usage varies with task complexity and student ability. In this paper, we investigate this through a large-scale study in an introductory programming course, collecting 13,305 interactions from 355 students during a three-day lab activity. Our analysis shows that students primarily use prompting to generate initial solutions, and then often enter short edit-run loops to refine their code following a failed execution. Student reflections confirm that prompting is helpful for structuring solutions, editing is effective for making targeted corrections, while both are useful for learning. We find that manual editing becomes more frequent as task complexity increases, but most edits remain concise, with many affecting a single line of code. Higher-performing students tend to succeed using prompting alone, while lower-performing students rely more on edits. These findings highlight the role of manual editing as a deliberate last-mile repair strategy, complementing prompting in AI-assisted programming workflows.

Paper Structure

This paper contains 12 sections, 4 figures.

Figures (4)

  • Figure 1: Illustration of a student's interaction combining prompting and editing. (a) presents the specification for the 'sort sub-list' problem, where a sub-list must be sorted within the given index range. (b) recreates a genuine student's interaction, demonstrating how they interleave natural-language prompting and manual code editing, summarized by the state-transition diagram.
  • Figure 2: RQ1. Patterns of prompting and editing across problems. (a) presents aggregate transitions between interaction states: Start/Reset, NL message (Msg), NL message with attached code-edit (MsgAtt), execution of GenAI-generated code (RunAI), manual edit (Edit), execution of edited code (RunEdit), and the terminal outcomes Success (Succ) or Failure (Fail). (b) shows sessions structured into alternating odd turns with messaging/terminal events (Msg, MsgAtt, Reset, Succ, Fail), and even turns summarizing execution/edit actions: RunAI (execution of GenAI code only); {Edit} (potentially multiple edits without execution); {RunAI, Edit} (execution of GenAI code followed by edits without execution); {RunEdit} (potentially multiple loops of editing and running edited code); {RunAI, RunEdit} (both execution of GenAI code and edit-execute loops); and NoOp (no execution or edits).
  • Figure 3: RQ2. Editing activity by problem category. (a) Fraction of students who edited code in $k$ of the problems they attempted. (b) Distribution of manual edits per session (i.e., student attempting a problem, possibly across multiple conversations). (c) Distribution of lines changed per retained edit (line-level Levenshtein distance); $0$ indicates code executed or attached unchanged or reverted by the end of the edit cycle. (d) Number of additional messages sent after an edit before reaching a terminal outcome.
  • Figure 4: RQ3. Activity by group (LowScore, HighScore) on Basic Functions problems: P1, P2, P3, and Overall (averaged across the three problems). (a) Proportion of sessions that include at least one Edit action. (b) Proportion of sessions that include at least one RunEdit action. (c) Mean number of messages sent until the first successful attempt (absolute prompting volume).