Table of Contents
Fetching ...

Characterizing and Classifying Developer Forum Posts with their Intentions

Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo

TL;DR

This study tackles the challenge of organizing and retrieving developer forum posts by introducing an intention-centric perspective. It combines a qualitative analysis to craft a seven-category post-intention taxonomy with a manually annotated dataset, then develops a transformer-based framework that fuses post text with code-block content signals to predict multiple intentions per post. Across experiments, general-purpose transformers (e.g., BERT/RoBERTa) outperform domain-specific PTMs, and fine-tuning the PTM pooler yields additional gains, achieving a Micro F1 of ~0.589 and high Top-k accuracies. The work offers actionable guidance for industry practitioners and forum platforms, including code-block conventions and intent-aware tagging to enhance search and recommendations, and it releases the annotated dataset and code for reproducibility.

Abstract

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.

Characterizing and Classifying Developer Forum Posts with their Intentions

TL;DR

This study tackles the challenge of organizing and retrieving developer forum posts by introducing an intention-centric perspective. It combines a qualitative analysis to craft a seven-category post-intention taxonomy with a manually annotated dataset, then develops a transformer-based framework that fuses post text with code-block content signals to predict multiple intentions per post. Across experiments, general-purpose transformers (e.g., BERT/RoBERTa) outperform domain-specific PTMs, and fine-tuning the PTM pooler yields additional gains, achieving a Micro F1 of ~0.589 and high Top-k accuracies. The work offers actionable guidance for industry practitioners and forum platforms, including code-block conventions and intent-aware tagging to enhance search and recommendations, and it releases the annotated dataset and code for reproducibility.

Abstract

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.
Paper Structure (39 sections, 7 equations, 5 figures, 10 tables)

This paper contains 39 sections, 7 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: An example post from elastic.co, a Discourse-based online community.
  • Figure 2: An overview of our manual study process.
  • Figure 3: The co-occurrence matrix of intentions. Each row is divided by the number of posts of the corresponding intention.
  • Figure 4: An overview of our intention detection framework. The section numbers in the dashed circles correspond to the respective descriptions.
  • Figure 5: The distributions of description lengths of posts in our dataset.