Table of Contents
Fetching ...

IntelliExplain: Enhancing Conversational Code Generation for Non-Professional Programmers

Hao Yan, Thomas D. Latoza, Ziyu Yao

TL;DR

IntelliExplain addresses the accessibility gap for non-professional programmers by combining enhanced natural language explanations of generated code with a structured human–LLM interaction paradigm. Through two user studies, it demonstrates that the approach significantly improves success rates and reduces task time in SQL and Python coding tasks compared to vanilla chat-based code generation. The framework uses restated questions for SQL, concise code descriptions for Python, and an iterative NL-feedback loop guided by execution results to refine solutions. Overall, IntelliExplain advances practical conversational coding for non-experts and highlights future work in explanation design, interaction structure, and robustness against LLM limitations.

Abstract

Chat LLMs such as GPT-3.5-turbo and GPT-4 have shown promise in assisting humans in coding, particularly by enabling them to conversationally provide feedback. However, current approaches assume users have expert debugging skills, limiting accessibility for non-professional programmers. In this paper, we first explore Chat LLMs' limitations in assisting non-professional programmers with coding. Through a formative study, we identify two key elements affecting their experience: the way a Chat LLM explains its generated code and the structure of human-LLM interaction. We then propose IntelliExplain, a new conversational code generation framework with enhanced code explanations and a structured interaction paradigm, which enforces both better code understanding and a more effective feedback loop. In two programming tasks (SQL and Python), IntelliExplain yields significantly higher success rates and reduces task time compared to the vanilla Chat LLM. We also identify several opportunities that remain in effectively offering a chat-based programming experience for non-professional programmers.

IntelliExplain: Enhancing Conversational Code Generation for Non-Professional Programmers

TL;DR

IntelliExplain addresses the accessibility gap for non-professional programmers by combining enhanced natural language explanations of generated code with a structured human–LLM interaction paradigm. Through two user studies, it demonstrates that the approach significantly improves success rates and reduces task time in SQL and Python coding tasks compared to vanilla chat-based code generation. The framework uses restated questions for SQL, concise code descriptions for Python, and an iterative NL-feedback loop guided by execution results to refine solutions. Overall, IntelliExplain advances practical conversational coding for non-experts and highlights future work in explanation design, interaction structure, and robustness against LLM limitations.

Abstract

Chat LLMs such as GPT-3.5-turbo and GPT-4 have shown promise in assisting humans in coding, particularly by enabling them to conversationally provide feedback. However, current approaches assume users have expert debugging skills, limiting accessibility for non-professional programmers. In this paper, we first explore Chat LLMs' limitations in assisting non-professional programmers with coding. Through a formative study, we identify two key elements affecting their experience: the way a Chat LLM explains its generated code and the structure of human-LLM interaction. We then propose IntelliExplain, a new conversational code generation framework with enhanced code explanations and a structured interaction paradigm, which enforces both better code understanding and a more effective feedback loop. In two programming tasks (SQL and Python), IntelliExplain yields significantly higher success rates and reduces task time compared to the vanilla Chat LLM. We also identify several opportunities that remain in effectively offering a chat-based programming experience for non-professional programmers.
Paper Structure (32 sections, 4 figures, 17 tables)

This paper contains 32 sections, 4 figures, 17 tables.

Figures (4)

  • Figure 1: The user interface used in our user studies. For both tasks, the UI consists of 4 components: (A) Contextual information needed to answer the question (e.g., sample database records for Text-to-SQL and test cases and expected outputs for Python Code Generation); (B) (Only for IntelliExplain) Execution results, which are returned values by executing the predicted code against the database or test cases; (C) Chatbot interface, showing the conversation history between participants and the Chat LLM, including a text box for user input and a "Submit" button; and (D) Control panel, including two "Skip" buttons.
  • Figure 2: Our proposed interaction paradigm, consisting of (1) a user asks a coding question and provides the context that is necessary for answering the question; (2) LLM predicts an initial code answer; (3) LLM generates an NL explanation for the initial code; (4) the user judges the explanation and determines whether the code is correct; if any error is found in the explanation, the participant provides NL feedback for error correction; and (5) the LLM refines its answer based on the feedback. Steps 3-5 repeat until the user cannot find more errors in the explanation.
  • Figure 3: Box plots of the success rate and task time by tool for the Text-to-SQL and Python Code Gen tasks
  • Figure 4: With IntelliExplain, the participant can comprehend the source code via NL explanation (in ) to more easily identify potential errors. IntelliExplain makes corrections based on participant feedback. In contrast, when interacting directly with code in vanilla GPT-3.5, the participant struggles to understand source code and fails to identify errors. GPT-3.5 may also sometimes generates responses that are irrelevant to the user question (in ).