Table of Contents
Fetching ...

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

Ranim Khojah, Mazen Mohamad, Philipp Leitner, Francisco Gomes de Oliveira Neto

TL;DR

This paper investigates how software engineers in industry actually use ChatGPT, revealing that practitioners predominantly seek guidance and learning rather than directly generating production code. It presents a theoretical framework that connects interaction purpose, user/internal factors (prompts, personality), and external factors (policies, data sources) to perceived usefulness and trust. The study analyzes 24 professionals over five days, using a combination of chat logs and exit surveys, and classifies dialogues into Artifact Manipulation, Expert Consultation, and Training. The findings suggest practical implications for enterprise AI deployment, prompt design, and future empirical research in LLM-assisted software engineering.

Abstract

Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

TL;DR

This paper investigates how software engineers in industry actually use ChatGPT, revealing that practitioners predominantly seek guidance and learning rather than directly generating production code. It presents a theoretical framework that connects interaction purpose, user/internal factors (prompts, personality), and external factors (policies, data sources) to perceived usefulness and trust. The study analyzes 24 professionals over five days, using a combination of chat logs and exit surveys, and classifies dialogues into Artifact Manipulation, Expert Consultation, and Training. The findings suggest practical implications for enterprise AI deployment, prompt design, and future empirical research in LLM-assisted software engineering.

Abstract

Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.
Paper Structure (19 sections, 5 figures, 3 tables)

This paper contains 19 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The main steps followed in our observational study.
  • Figure 2: Decision tree to guide dialogue classification. The tree starts with determining if there is a practical problem. If yes, it checks if the user's goal is to be guided. If yes, it leads to Expert Consultation; if not, it checks if the user is looking for an executable solution leading to Artifact Manipulation or Expert Consultation. If there is no practical problem initially, it checks for a development of understanding in the dialogue leading either to Training or Expert Consultation.
  • Figure 3: A theoretical framework of the factors that influence the personal experience of interactions with ChatGPT in industrial software engineering.
  • Figure 4: Taxonomy of purposes for the usage of ChatGPT in software engineering.
  • Figure 5: Plots showing how the 23 participants reported ChatGPT's usefulness (left) and trust in its answer (right).