"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

Jiachen Li; Elizabeth Mynatt; Varun Mishra; Jonathan Bell

"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

Jiachen Li, Elizabeth Mynatt, Varun Mishra, Jonathan Bell

TL;DR

The paper addresses how developers integrate AI-powered chatbots with traditional human-powered Q&A for coding support, comparing ChatGPT with Stack Overflow through a thematic analysis of 1,758 Reddit posts collected from nov 2022 to apr 2023. It combines qualitative insights with exploratory quantitative metrics to reveal ChatGPT's fast, clear, and patient assistance alongside reliability concerns and transparency gaps, contrasting them with Stack Overflow's established norms and validation mechanisms. The study synthesizes design implications for GenAI coding assistants and proposes a workflow that blends AI-driven quick help with crowd-based validation to improve developer experiences. The findings are significant for guiding the next generation of coding tools toward collaborative, trustworthy, and learning-friendly AI copilots that augment rather than replace human expertise.

Abstract

Software engineers have historically relied on human-powered Q&A platforms like Stack Overflow (SO) as coding aids. With the rise of generative AI, developers have started to adopt AI chatbots, such as ChatGPT, in their software development process. Recognizing the potential parallels between human-powered Q&A platforms and AI-powered question-based chatbots, we investigate and compare how developers integrate this assistance into their real-world coding experiences by conducting a thematic analysis of 1700+ Reddit posts. Through a comparative study of SO and ChatGPT, we identified each platform's strengths, use cases, and barriers. Our findings suggest that ChatGPT offers fast, clear, comprehensive responses and fosters a more respectful environment than SO. However, concerns about ChatGPT's reliability stem from its overly confident tone and the absence of validation mechanisms like SO's voting system. Based on these findings, we synthesized the design implications for future GenAI code assistants and recommend a workflow leveraging each platform's unique features to improve developer experiences.

"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

TL;DR

Abstract

Paper Structure (31 sections, 3 figures)

This paper contains 31 sections, 3 figures.

Introduction
Background & Related Works
Q&A Platform for Programming Practices
Stack Overflow
Generative AI for Coding
Method
Data Mining
Data Filtering
Data Analysis
Initial Codebook Development
Final Codebook Development
Ethical Considerations
Exploratory Quantitative Results
Distribution of Posts
Analysis
...and 16 more sections

Figures (3)

Figure 1: The workflow of data mining, filtering, and analysis.
Figure 2: Description of the dataset: left: distribution of weekly post counts from 11/30/2022 to 4/30/2023; middle: distribution of post counts across various subreddits; right: wordcloud generated from the dataset
Figure 3: Different phases of seeking coding guidance through AI-powered chatbot and Q&A platform.

"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

TL;DR

Abstract

"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

Authors

TL;DR

Abstract

Table of Contents

Figures (3)