How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

Zhangqian Bi; Yao Wan; Zhaoyang Chu; Yufei Hu; Junyi Zhang; Hongyu Zhang; Guandong Xu; Hai Jin

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

Zhangqian Bi, Yao Wan, Zhaoyang Chu, Yufei Hu, Junyi Zhang, Hongyu Zhang, Guandong Xu, Hai Jin

TL;DR

The paper addresses the challenge of reusing Pre-trained Code Models (PCMs) by comparing naive selection strategies to learning-based model selection under a limited fine-tuning budget. It formulates transferability as a rankable score and proposes two families of strategies—proxy-based and distribution-based—to estimate PCM usefulness without exhaustive fine-tuning. Across 100 PCMs and three code tasks, learning-based selection achieves substantial time savings (from ~$2{,}700$ hours to ~ $100$ seconds) with minimal performance degradation (≤ $\le 6\%$) compared to brute-force tuning. The work provides practical, scalable guidelines for PCM reuse in AI-assisted software engineering, highlighting that model metadata alone is insufficient for effective selection and that budget-aware ranking can accelerate prototyping and deployment.

Abstract

Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However, pretraining language models on a large-scale code corpus is computationally expensive. Fortunately, many off-the-shelf Pre-trained Code Models (PCMs), such as CodeBERT, CodeT5, CodeGen, and Code Llama, have been released publicly. These models acquire general code understanding and generation capability during pretraining, which enhances their performance on downstream code intelligence tasks. With an increasing number of these public pre-trained models, selecting the most suitable one to reuse for a specific task is essential. In this paper, we systematically investigate the reusability of PCMs. We first explore three intuitive model selection methods that select by size, training data, or brute-force fine-tuning. Experimental results show that these straightforward techniques either perform poorly or suffer high costs. Motivated by these findings, we explore learning-based model selection strategies that utilize pre-trained models without altering their parameters. Specifically, we train proxy models to gauge the performance of pre-trained models, and measure the distribution deviation between a model's latent features and the task's labels, using their closeness as an indicator of model transferability. We conduct experiments on 100 widely-used opensource PCMs for code intelligence tasks, with sizes ranging from 42.5 million to 3 billion parameters. The results demonstrate that learning-based selection methods reduce selection time to 100 seconds, compared to 2,700 hours with brute-force fine-tuning, with less than 6% performance degradation across related tasks.

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

TL;DR

hours to ~

seconds) with minimal performance degradation (≤

) compared to brute-force tuning. The work provides practical, scalable guidelines for PCM reuse in AI-assisted software engineering, highlighting that model metadata alone is insufficient for effective selection and that budget-aware ranking can accelerate prototyping and deployment.

Abstract

Paper Structure (25 sections, 11 equations, 8 figures, 3 tables)

This paper contains 25 sections, 11 equations, 8 figures, 3 tables.

Introduction
Preliminaries
Pre-Trained Code Models (PCMs)
Investigated PCMs
Downstream Tasks of Interest
Three Task-Agnostic Approaches
Model Selection Based on Model Size
Model Selection Based on Training Data
Model Selection Based on Brute-Force Fine-Tuning
Learning to Select Models
Problem Formulation
Proxy-Based Methods
Distribution-Based Methods
Learning Strategies Evaluation
Datasets and Training Setup
...and 10 more sections

Figures (8)

Figure 1: The pipeline of developing and using PCMs, from pre-training to fine-tuning
Figure 2: The accuracy of each PCM when adapted to the vulnerability detection task via fine-tuning. The accuracy is represented by gradients, with deeper gradients indicating higher values. Models are sorted in ascending order by size
Figure 3: The accuracy of (a) CodeBERT models and (b) PLBART models on the vulnerability detection task. The models are pre-trained on datasets of different sizes and programming languages
Figure 4: The time cost of brute-force fine-tuning and various learning strategies for model selection. The horizontal axis (Model Hub Size) represents the number of models involved in the selection process (measured per model), and the vertical axis (Time) indicates the selection time cost in seconds
Figure 5: An illustration of the (a) proxy-based and (b) distribution-based model selection strategies
...and 3 more figures

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

TL;DR

Abstract

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (8)