Table of Contents
Fetching ...

User Misconceptions of LLM-Based Conversational Programming Assistants

Gabrielle O'Brien, Antonio Pedro Santos Alves, Sebastian Baltes, Grischa Liebel, Mircea Lungu, Marcos Kalinowski

TL;DR

This work investigates user misconceptions about conversational LLM-based programming assistants by combining a brainstorming phase with a qualitative log analysis of Python-related interactions from the WildChat dataset. It distinguishes misconceptions about tool affordances (e.g., web access, code execution, memory) from model-level misunderstandings (e.g., stability, grounding, context effects) and identifies high-confidence patterns, such as web access and non-text outputs being commonly misperceived. The study shows that misperceptions can influence user strategies during programming tasks and argues for interface designs that clearly communicate tool capabilities and limitations. The findings offer actionable guidance for designing safer, more trustworthy LLM-based coding assistants and point to future work in surveys and automated coding-behavior analyses to generalize the taxonomy.

Abstract

Programming assistants powered by large language models (LLMs) have become widely available, with conversational assistants like ChatGPT proving particularly accessible to less experienced programmers. However, the varied capabilities of these tools across model versions and the mixed availability of extensions that enable web search, code execution, or retrieval-augmented generation create opportunities for user misconceptions about what systems can and cannot do. Such misconceptions may lead to over-reliance, unproductive practices, or insufficient quality control in LLM-assisted programming. Here, we aim to characterize misconceptions that users of conversational LLM-based assistants may have in programming contexts. Using a two-phase approach, we first brainstorm and catalog user misconceptions that may occur, and then conduct a qualitative analysis to examine whether these conceptual issues surface in naturalistic Python-programming conversations with an LLM-based chatbot drawn from an openly available dataset. Indeed, we see evidence that some users have misplaced expectations about the availability of LLM-based chatbot features like web access, code execution, or non-text output generation. We also see potential evidence for deeper conceptual issues around the scope of information required to debug, validate, and optimize programs. Our findings reinforce the need for designing LLM-based tools that more clearly communicate their programming capabilities to users.

User Misconceptions of LLM-Based Conversational Programming Assistants

TL;DR

This work investigates user misconceptions about conversational LLM-based programming assistants by combining a brainstorming phase with a qualitative log analysis of Python-related interactions from the WildChat dataset. It distinguishes misconceptions about tool affordances (e.g., web access, code execution, memory) from model-level misunderstandings (e.g., stability, grounding, context effects) and identifies high-confidence patterns, such as web access and non-text outputs being commonly misperceived. The study shows that misperceptions can influence user strategies during programming tasks and argues for interface designs that clearly communicate tool capabilities and limitations. The findings offer actionable guidance for designing safer, more trustworthy LLM-based coding assistants and point to future work in surveys and automated coding-behavior analyses to generalize the taxonomy.

Abstract

Programming assistants powered by large language models (LLMs) have become widely available, with conversational assistants like ChatGPT proving particularly accessible to less experienced programmers. However, the varied capabilities of these tools across model versions and the mixed availability of extensions that enable web search, code execution, or retrieval-augmented generation create opportunities for user misconceptions about what systems can and cannot do. Such misconceptions may lead to over-reliance, unproductive practices, or insufficient quality control in LLM-assisted programming. Here, we aim to characterize misconceptions that users of conversational LLM-based assistants may have in programming contexts. Using a two-phase approach, we first brainstorm and catalog user misconceptions that may occur, and then conduct a qualitative analysis to examine whether these conceptual issues surface in naturalistic Python-programming conversations with an LLM-based chatbot drawn from an openly available dataset. Indeed, we see evidence that some users have misplaced expectations about the availability of LLM-based chatbot features like web access, code execution, or non-text output generation. We also see potential evidence for deeper conceptual issues around the scope of information required to debug, validate, and optimize programs. Our findings reinforce the need for designing LLM-based tools that more clearly communicate their programming capabilities to users.

Paper Structure

This paper contains 48 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Pre-processing pipeline for identifying coding-related conversations.
  • Figure 2: Annotation tool for labeling conversations.