Table of Contents
Fetching ...

MPLinker: Multi-template Prompt-tuning with Adversarial Training for Issue-commit Link Recovery

Bangchao Wang, Yang Deng, Ruiqi Luo, Peng Liang, Tingting Bi

TL;DR

MPLinker reframes issue-commit link recovery as a cloze task using multi-template prompt-tuning and adversarial training to exploit pre-trained language models more effectively for software traceability. It introduces three prompt architectures (Single-template, Multi-template, CLSPrompt) and shows that the multi-template approach, combined with adversarial training, delivers state-of-the-art ILR performance across six open-source projects, with high metrics such as F1 around 96% and ACC around 98%. RoBERTa emerges as the most robust PLM for this task, while adversarial training generally improves performance and generalization, especially on larger datasets. The work provides practical guidelines for template design, demonstrates strong cross-project applicability, and offers a replication package to facilitate adoption in real-world software maintenance workflows.

Abstract

In recent years, the pre-training, prompting and prediction paradigm, known as prompt-tuning, has achieved significant success in Natural Language Processing (NLP). Issue-commit Link Recovery (ILR) in Software Traceability (ST) plays an important role in improving the reliability, quality, and security of software systems. The current ILR methods convert the ILR into a classification task using pre-trained language models (PLMs) and dedicated neural networks. these methods do not fully utilize the semantic information embedded in PLMs, resulting in not achieving acceptable performance. To address this limitation, we introduce a novel paradigm: Multi-template Prompt-tuning with adversarial training for issue-commit Link recovery (MPLinker). MPLinker redefines the ILR task as a cloze task via template-based prompt-tuning and incorporates adversarial training to enhance model generalization and reduce overfitting. We evaluated MPLinker on six open-source projects using a comprehensive set of performance metrics. The experiment results demonstrate that MPLinker achieves an average F1-score of 96.10%, Precision of 96.49%, Recall of 95.92%, MCC of 94.04%, AUC of 96.05%, and ACC of 98.15%, significantly outperforming existing state-of-the-art methods. Overall, MPLinker improves the performance and generalization of ILR models, and introduces innovative concepts and methods for ILR. The replication package for MPLinker is available at https://github.com/WTU-intelligent-software-development/MPLinker

MPLinker: Multi-template Prompt-tuning with Adversarial Training for Issue-commit Link Recovery

TL;DR

MPLinker reframes issue-commit link recovery as a cloze task using multi-template prompt-tuning and adversarial training to exploit pre-trained language models more effectively for software traceability. It introduces three prompt architectures (Single-template, Multi-template, CLSPrompt) and shows that the multi-template approach, combined with adversarial training, delivers state-of-the-art ILR performance across six open-source projects, with high metrics such as F1 around 96% and ACC around 98%. RoBERTa emerges as the most robust PLM for this task, while adversarial training generally improves performance and generalization, especially on larger datasets. The work provides practical guidelines for template design, demonstrates strong cross-project applicability, and offers a replication package to facilitate adoption in real-world software maintenance workflows.

Abstract

In recent years, the pre-training, prompting and prediction paradigm, known as prompt-tuning, has achieved significant success in Natural Language Processing (NLP). Issue-commit Link Recovery (ILR) in Software Traceability (ST) plays an important role in improving the reliability, quality, and security of software systems. The current ILR methods convert the ILR into a classification task using pre-trained language models (PLMs) and dedicated neural networks. these methods do not fully utilize the semantic information embedded in PLMs, resulting in not achieving acceptable performance. To address this limitation, we introduce a novel paradigm: Multi-template Prompt-tuning with adversarial training for issue-commit Link recovery (MPLinker). MPLinker redefines the ILR task as a cloze task via template-based prompt-tuning and incorporates adversarial training to enhance model generalization and reduce overfitting. We evaluated MPLinker on six open-source projects using a comprehensive set of performance metrics. The experiment results demonstrate that MPLinker achieves an average F1-score of 96.10%, Precision of 96.49%, Recall of 95.92%, MCC of 94.04%, AUC of 96.05%, and ACC of 98.15%, significantly outperforming existing state-of-the-art methods. Overall, MPLinker improves the performance and generalization of ILR models, and introduces innovative concepts and methods for ILR. The replication package for MPLinker is available at https://github.com/WTU-intelligent-software-development/MPLinker

Paper Structure

This paper contains 25 sections, 25 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The motivation for prompt-tuning PLMs for ILR. Figure 1.a and Figure 1.b illustrate common pre-training tasks. Figure 1.a shows the masked language model (MLM), while Figure 1.b illustrates the next sentence prediction (NSP). Figures 1.c and 1.d depict the use of PLMs for downstream tasks. Figure 1.c shows the fine-tuning of the PLM for ILR, whereas Figure 1.d represents the prompt-tuning of the PLM for ILR.
  • Figure 2: The overview of MPLinker. Multi-template Prompt on the left and adversarial training on the right.
  • Figure 3: The comparison of Single-template Prompt, Multi-template Prompt, and CLSPrompt.
  • Figure 4: Performance comparison of three PLMs across six projects.
  • Figure 5: The mean, standard deviation for the ILR task across three PLMs: RoBERTa, BERT, and GPT-2.