PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He; Bowen Xu; Zhou Yang; DongGyun Han; Chengran Yang; Jiakun Liu; Zhipeng Zhao; David Lo

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Junda He, Bowen Xu, Zhou Yang, DongGyun Han, Chengran Yang, Jiakun Liu, Zhipeng Zhao, David Lo

TL;DR

PTM4Tag+ presents a transformer-based framework for Stack Overflow tag recommendation that treats tagging as multi-label classification and processes three post components (Title, Description, Code) with independent PTMs. Among eight PTMs, CodeT5-based encoder-decoder variants achieve the best performance, substantially surpassing the prior state-of-the-art Post2Vec, while smaller PTMs offer notable latency reductions with minimal accuracy loss. The authors perform extensive ablation and dataset updates, showing that including code content and in-domain pretraining improves tag quality and generalization. Overall, the work advances practical, scalable tag suggestion for large-scale SQA platforms and suggests applying the approach to additional sites and more diverse textual artifacts.

Abstract

Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant content. Poorly selected tags often raise problems like tag ambiguity and tag explosion. Thus, a precise and accurate automated tag recommendation technique is demanded. Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilizes PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including the BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5). Our results show that leveraging CodeT5 under the PTM4Tag+ framework achieves the best performance among the eight considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a substantial margin in terms of average P recision@k, Recall@k, and F1-score@k (k ranges from 1 to 5). Specifically, CodeT5 improves the performance of F1-score@1-5 by 8.8%, 12.4%, 15.3%, 16.4%, and 16.6%. Moreover, to address the concern with inference latency, we experiment PTM4Tag+ with smaller PTM models (i.e., DistilBERT, DistilRoBERTa, CodeBERT-small, and CodeT5-small). We find that although smaller PTMs cannot outperform larger PTMs, they still maintain over 93.96% of the performance on average, meanwhile shortening the mean inference time by more than 47.2%

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

TL;DR

Abstract

Paper Structure (31 sections, 7 equations, 6 figures, 14 tables)

This paper contains 31 sections, 7 equations, 6 figures, 14 tables.

Introduction
Background
Tag Recommendation Problem
Post2Vec
Pre-trained Language Models
BERT-based Pre-trained Models
Encoder-decoder Pre-trained Models
Methodology
Pre-processing
Post Component Extraction
Tokenization
Feature Extraction
Language Modeling with PTMs
Pooling and Concatenation
Model Training and Inference
...and 16 more sections

Figures (6)

Figure 1: The architecture of encoder-only and encoder-decoder models
Figure 2: An example of an SO Post. A post contains a short title that summarizes the main content of this post. The body of a post can include detailed descriptions written in natural languages and code snippets.
Figure 3: The overview of the PTM4Tag+ framework. The title, description, and code are extracted from an SO post and fed into three different pre-trained models to obtain embeddings for each of them. A classification model takes the processed embeddings as input and produces probabilities for each tag.
Figure 4: Distribution of F1-Score@5 for All PTM4Tag+ Variants Using a Triplet Architecture
Figure 5: A line chart demonstrate performance difference in $F1$-$score@k$ between each ablated models and CodeT5$_{ALL}$, where $k \in \{1,2,3,4,5\}$. The value on the y axis is calculated using the corresponding score of the candidate ablated model minus the corresponding score of CodeT5$_{ALL}$ .
...and 1 more figures

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

TL;DR

Abstract

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)