Table of Contents
Fetching ...

Towards Better Answers: Automated Stack Overflow Post Updating

Yubo Mai, Zhipeng Gao, Haoye Wang, Tingting Bi, Xing Hu, Xin Xia, Jianling Sun

TL;DR

Soup introduces a two-task, LLM-based framework to automatically update Stack Overflow posts by leveraging corrective comments. It formulates Valid Comment-edit Prediction (VCP) and Automatic Post Updating (APU), builds a high-quality VCP dataset through manual annotation and large-scale labeling, and then uses that to train APU models that generate post-edits for code snippets. Empirically, Soup_p achieves 80.8% precision and 74.0% recall on VCP, while Soup_u attains 25.6% exact-match and 73.5% CodeBLEU on APU, significantly outperforming baselines; in-the-wild testing yields 21 out of 50 accepted edits (42%). The work demonstrates practical value in improving code quality on SO, provides substantial datasets for replication, and discusses trade-offs between LLM-driven automation and community governance.

Abstract

Utilizing code snippets on Stack Overflow (SO) is a common practice among developers for problem-solving. Although SO code snippets serve as valuable resources, it is important to acknowledge their imperfections, reusing problematic code snippets can lead to the introduction of suboptimal or buggy code into software projects. SO comments often point out weaknesses of a post and provide valuable insights to improve the quality of answers, while SO comments are usually missed and/or ignored, leaving these problematic code snippets untouched. In this work, we first investigate the task of automatic SO posts updating based on their associated comments. We introduce a novel framework, named Soup (Stack Overflow Updator for Post) for this task. Soup addresses two key tasks: Valid Comment-Edit Prediction (VCP) and Automatic Post Updating (APU). Extensive experimental results show the promising performance of our model over a set of benchmarks. Moreover, we also performed an in-the-wild evaluation on Stack Overflow, we submitted 50 edits generated by our approach to Stack Overflow posts and 21 of them have been verified and accepted by SO maintainers, further proving the practical value of Soup.

Towards Better Answers: Automated Stack Overflow Post Updating

TL;DR

Soup introduces a two-task, LLM-based framework to automatically update Stack Overflow posts by leveraging corrective comments. It formulates Valid Comment-edit Prediction (VCP) and Automatic Post Updating (APU), builds a high-quality VCP dataset through manual annotation and large-scale labeling, and then uses that to train APU models that generate post-edits for code snippets. Empirically, Soup_p achieves 80.8% precision and 74.0% recall on VCP, while Soup_u attains 25.6% exact-match and 73.5% CodeBLEU on APU, significantly outperforming baselines; in-the-wild testing yields 21 out of 50 accepted edits (42%). The work demonstrates practical value in improving code quality on SO, provides substantial datasets for replication, and discusses trade-offs between LLM-driven automation and community governance.

Abstract

Utilizing code snippets on Stack Overflow (SO) is a common practice among developers for problem-solving. Although SO code snippets serve as valuable resources, it is important to acknowledge their imperfections, reusing problematic code snippets can lead to the introduction of suboptimal or buggy code into software projects. SO comments often point out weaknesses of a post and provide valuable insights to improve the quality of answers, while SO comments are usually missed and/or ignored, leaving these problematic code snippets untouched. In this work, we first investigate the task of automatic SO posts updating based on their associated comments. We introduce a novel framework, named Soup (Stack Overflow Updator for Post) for this task. Soup addresses two key tasks: Valid Comment-Edit Prediction (VCP) and Automatic Post Updating (APU). Extensive experimental results show the promising performance of our model over a set of benchmarks. Moreover, we also performed an in-the-wild evaluation on Stack Overflow, we submitted 50 edits generated by our approach to Stack Overflow posts and 21 of them have been verified and accepted by SO maintainers, further proving the practical value of Soup.
Paper Structure (42 sections, 2 equations, 7 figures, 8 tables)

This paper contains 42 sections, 2 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Stack Overflow Post Example
  • Figure 2: Motivating Examples
  • Figure 3: Preliminary Investigation Examples
  • Figure 4: Workflow of Our Approach
  • Figure 5: Examples of Errors in Different Methods
  • ...and 2 more figures