Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning
Karim Galliamov, Leila Khaertdinova, Karina Denisova
TL;DR
This work addresses efficient adaptation of bimodal code-text retrieval models under resource constraints by applying Parameter-Efficient Fine-Tuning (PEFT) combined with contrastive learning to CodeT5+. The authors benchmark several PEFT methods (LoRA, AdaLoRA, IA3, Prompt-Tuning) across CodeSearchNet and a custom dataset, showing that only around $0.4\%$ of parameters need updating to improve retrieval quality. They also integrate the tuned embeddings into a Retrieval-Augmented Generation (RAG) pipeline, achieving modest ROUGE gains in code generation and providing open-source checkpoints. The study demonstrates the practicality of PEFT for code retrieval and offers a reusable framework for systematic benchmarking of PEFT methods in bimodal tasks.
Abstract
The latest developments in Natural Language Processing (NLP) have demonstrated remarkable progress in a code-text retrieval problem. As the Transformer-based models used in this task continue to increase in size, the computational costs and time required for end-to-end fine-tuning become substantial. This poses a significant challenge for adapting and utilizing these models when computational resources are limited. Motivated by these concerns, we propose a fine-tuning framework that leverages Parameter-Efficient Fine-Tuning (PEFT) techniques. Moreover, we adopt contrastive learning objectives to improve the quality of bimodal representations learned by transformer models. Additionally, for PEFT methods we provide extensive benchmarking, the lack of which has been highlighted as a crucial problem in the literature. Based on the thorough experimentation with the CodeT5+ model conducted on two datasets, we demonstrate that the proposed fine-tuning framework has the potential to improve code-text retrieval performance by tuning only 0.4% parameters at most.
