Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering
Iman Saberi, Fatemeh Fard, Fuxiang Chen
TL;DR
This work tackles the challenge of efficiently transferring knowledge from pre-trained language models to software engineering tasks. It introduces MODE-X, a cross-modal adapter framework that injects adapters trained on code into NL-PLMs to perform code-related tasks, and evaluates them on cloze tests, code clone detection, and code summarization, comparing against strong C-PLMs. The study further shows that adapters embedded in C-PLMs can improve performance on several SE tasks while remaining significantly more parameter-efficient than full fine-tuning, with probing and attention analyses elucidating how adapters reorganize representations toward code semantics. The findings suggest adapters enable scalable, resource-efficient knowledge transfer for SE, with practical implications for integrating such models into real-world development tools and IDEs. Overall, the paper demonstrates that adapters can bridge modalities and languages in SE while reducing training and storage demands, opening pathways for broader adoption and multilanguage support.
Abstract
Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural Language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of adapter, a compact and parameter efficient module that is inserted into a PLM. Although the use of adapters has shown promising results in many NLP-based downstream tasks, their application and exploration in SE-based downstream tasks are limited. Here, we study the knowledge transfer using adapters on multiple down-stream tasks including cloze test, code clone detection, and code summarization. These adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM and C-PLM, respectively. We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are sometimes on par with or exceed the results of C-PLM; while being more efficient in terms of the number of parameters and training time. Interestingly, adapters inserted into a C-PLM generally yield better results than a traditional fine-tuned C-PLM. Our results open new directions to build more compact models for SE tasks.
