An empirical study of question discussions on Stack Overflow
Wenhan Zhu, Haoxiang Zhang, Ahmed E. Hassan, Michael W. Godfrey
TL;DR
This study empirically explores question discussions on Stack Overflow, focusing on comments attached to questions and associated chat rooms. Using the December 2021 data dump (43.6M comments, 1.5M chat messages; 22.0M questions; 32.7M answers) plus chat-room crawling, it answers three questions about prevalence, participation, and impact on the Q&A process. Findings show questions with discussions are common (59.2%), involve both askers and answerers, and correlate with longer answer times and more substantial question edits ($\rho = 0.709$ for first-answer time and $\rho = 0.806$ for accepted-answer time), indicating discussions actively shape the knowledge-building process. The work suggests incorporating question discussions into models, tools, and platform design to improve knowledge maintenance, retrieval, and the overall Q&A experience.
Abstract
Stack Overflow provides a means for developers to exchange knowledge. While much previous research on Stack Overflow has focused on questions and answers (Q&A), recent work has shown that discussions in comments also contain rich information. On Stack Overflow, discussions through comments and chat rooms can be tied to questions or answers. In this paper, we conduct an empirical study that focuses on the nature of question discussions. We observe that: (1) Question discussions occur at all phases of the Q&A process, with most beginning before the first answer is received. (2) Both askers and answerers actively participate in question discussions; the likelihood of their participation increases as the number of comments increases. (3) There is a strong correlation between the number of question comments and the question answering time (i.e., more discussed questions receive answers more slowly); also, questions with a small number of comments are likely to be answered more quickly than questions with no discussion. Our findings suggest that question discussions contain a rich trove of data that is integral to the Q&A processes on Stack Overflow. We further suggest how future research can leverage the information in question discussions, along with the commonly studied Q&A information.
