From Human-to-Human to Human-to-Bot Conversations in Software Engineering
Ranim Khojah, Francisco Gomes de Oliveira Neto, Philipp Leitner
TL;DR
This paper addresses how software developers interact with humans, NLU-based bots, and LLM-based bots, and how these conversation modes differ in purpose, scope, listening, trust, and humor. It adopts Clark et al.'s conversation framework and an observational study of 24 engineers using ChatGPT to compare interaction dynamics, augmented with example conversations. Its contributions include a taxonomy-based comparison across partner types, empirical observations on trust and hallucinations in LLMs, and practical guidance for calibrating expectations and designing team communication in AI-enabled software development. The findings show LLM-based conversations are more human-like yet cannot replace human social interaction, and suggest hybrid workflows to maximize productivity while managing social and privacy considerations. The work informs practitioners about when and how to leverage chatbots to augment collaboration without undermining essential human communication.
Abstract
Software developers use natural language to interact not only with other humans, but increasingly also with chatbots. These interactions have different properties and flow differently based on what goal the developer wants to achieve and who they interact with. In this paper, we aim to understand the dynamics of conversations that occur during modern software development after the integration of AI and chatbots, enabling a deeper recognition of the advantages and disadvantages of including chatbot interactions in addition to human conversations in collaborative work. We compile existing conversation attributes with humans and NLU-based chatbots and adapt them to the context of software development. Then, we extend the comparison to include LLM-powered chatbots based on an observational study. We present similarities and differences between human-to-human and human-to-bot conversations, also distinguishing between NLU- and LLM-based chatbots. Furthermore, we discuss how understanding the differences among the conversation styles guides the developer on how to shape their expectations from a conversation and consequently support the communication within a software team. We conclude that the recent conversation styles that we observe with LLM-chatbots can not replace conversations with humans due to certain attributes regarding social aspects despite their ability to support productivity and decrease the developers' mental load.
