PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models
Junda He, Bowen Xu, Zhou Yang, DongGyun Han, Chengran Yang, Jiakun Liu, Zhipeng Zhao, David Lo
TL;DR
PTM4Tag+ presents a transformer-based framework for Stack Overflow tag recommendation that treats tagging as multi-label classification and processes three post components (Title, Description, Code) with independent PTMs. Among eight PTMs, CodeT5-based encoder-decoder variants achieve the best performance, substantially surpassing the prior state-of-the-art Post2Vec, while smaller PTMs offer notable latency reductions with minimal accuracy loss. The authors perform extensive ablation and dataset updates, showing that including code content and in-domain pretraining improves tag quality and generalization. Overall, the work advances practical, scalable tag suggestion for large-scale SQA platforms and suggests applying the approach to additional sites and more diverse textual artifacts.
Abstract
Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant content. Poorly selected tags often raise problems like tag ambiguity and tag explosion. Thus, a precise and accurate automated tag recommendation technique is demanded. Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilizes PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including the BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5). Our results show that leveraging CodeT5 under the PTM4Tag+ framework achieves the best performance among the eight considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a substantial margin in terms of average P recision@k, Recall@k, and F1-score@k (k ranges from 1 to 5). Specifically, CodeT5 improves the performance of F1-score@1-5 by 8.8%, 12.4%, 15.3%, 16.4%, and 16.6%. Moreover, to address the concern with inference latency, we experiment PTM4Tag+ with smaller PTM models (i.e., DistilBERT, DistilRoBERTa, CodeBERT-small, and CodeT5-small). We find that although smaller PTMs cannot outperform larger PTMs, they still maintain over 93.96% of the performance on average, meanwhile shortening the mean inference time by more than 47.2%
