Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning

Shaoyu Yang; Xiang Chen; Ke Liu; Guang Yang; Chi Yu

Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning

Shaoyu Yang, Xiang Chen, Ke Liu, Guang Yang, Chi Yu

TL;DR

This work tackles automatic Stack Overflow question title generation by leveraging bi-modal post content (code snippets and problem descriptions) through prompt learning. It introduces SOTitle+, a three-phase approach combining a high-quality, multi-language corpus, hybrid prompt-tuned CodeT5, and multi-task learning across six languages to generate concise, informative titles. Across automatic metrics and human evaluation, SOTitle+ outperforms four strong baselines, with notable gains especially in low-resource languages and when comparing prompt-tuning to fine-tuning. The study demonstrates the value of bi-modal information and prompt learning for software engineering tasks, and releases a large corpus, tools, and prompts to foster further research and practical title-generation support for Stack Overflow users.

Abstract

When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. However, this study ignored the helpful information in their corresponding problem descriptions. Therefore, we propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Then we formalize the title generation for different programming languages as separate but related tasks and utilize multi-task learning to solve these tasks. Later we fine-tune the pre-trained language model CodeT5 to automatically generate the titles. Unfortunately, the inconsistent inputs and optimization objectives between the pre-training task and our investigated task may make fine-tuning hard to fully explore the knowledge of the pre-trained model. To solve this issue, SOTitle+ further prompt-tunes CodeT5 with hybrid prompts (i.e., mixture of hard and soft prompts). To verify the effectiveness of SOTitle+, we construct a large-scale high-quality corpus from recent data dumps shared by Stack Overflow. Our corpus includes 179,119 high-quality question posts for six popular programming languages. Experimental results show that SOTitle+ can significantly outperform four state-of-the-art baselines in both automatic evaluation and human evaluation. Our work indicates that considering bi-modal information and prompt learning in Stack Overflow title generation is a promising exploration direction.

Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning

TL;DR

Abstract

Paper Structure (36 sections, 10 equations, 7 figures, 11 tables)

This paper contains 36 sections, 10 equations, 7 figures, 11 tables.

Introduction
Our Proposed Approach SOTitle+
Corpus Construction Phase
Model Construction Phase
Hybrid Prompt Template Construction
Multi-task Learning
Prompt Tuning on Pre-trained Model CodeT5
Model Prediction Phase
Experiment Setup
Research Questions
Experimental Subject
Performance Measures
Baselines
Implementation Details
Running Platform and Model Training Time
...and 21 more sections

Figures (7)

Figure 1: Two posts from Stack Overflow, which have the same code snippet but different problem descriptions
Figure 2: Illustration on the process of pre-training, fine-tuning, and prompt tuning of the SOQTG task. We use [MASK] and [BOS] to denote two special tokens in CodeT5.
Figure 3: Framework of our proposed approach SOTitle+
Figure 4: A question post related to Python programming language
Figure 5: The question titles generated by SOTitle+ and baselines for a question post related to the Python programming language
...and 2 more figures

Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning

TL;DR

Abstract

Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)