Table of Contents
Fetching ...

Revisiting Code Search in a Two-Stage Paradigm

Fan Hu, Yanlin Wang, Lun Du, Xirong Li, Hongyu Zhang, Shi Han, Dongmei Zhang

TL;DR

This work proposes TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods and shows that TOSS is not only efficient, but also achieves state-of-the-art accuracy.

Abstract

With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast, however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates, and then uses fine-grained cross-encoders for finer ranking. Furthermore, we conduct extensive experiments on different code candidate volumes and multiple programming languages to verify the effectiveness of TOSS. We also compare TOSS with six data fusion methods. Experimental results show that TOSS is not only efficient, but also achieves state-of-the-art accuracy with an overall mean reciprocal ranking (MRR) score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713. Our source code and experimental data are available at: https://github.com/fly-dragon211/TOSS.

Revisiting Code Search in a Two-Stage Paradigm

TL;DR

This work proposes TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods and shows that TOSS is not only efficient, but also achieves state-of-the-art accuracy.

Abstract

With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast, however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates, and then uses fine-grained cross-encoders for finer ranking. Furthermore, we conduct extensive experiments on different code candidate volumes and multiple programming languages to verify the effectiveness of TOSS. We also compare TOSS with six data fusion methods. Experimental results show that TOSS is not only efficient, but also achieves state-of-the-art accuracy with an overall mean reciprocal ranking (MRR) score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713. Our source code and experimental data are available at: https://github.com/fly-dragon211/TOSS.
Paper Structure (19 sections, 6 equations, 5 figures, 5 tables)

This paper contains 19 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The concept diagram of bi-encoder and cross-encoder code search models. Bi-encoder models are fast as the code embeddings can be pre-calculated offline. While cross-encoder models perform full-attention over the input pair of query and code, which could gain more information.
  • Figure 2: The overall framework of our two-stage paradigm TOSS.
  • Figure 3: Visualization results of the top-1 recalled samples based on the four baselines being used and ground truth (GT) in the CodeSearchNet python test set. The diversity of recalled code candidates is higher for methods of different paradigms. The coincident number of recalls for fusing (GraphCodeBERT and BM25) is 5,108, which is less than text matching methods (BM25 and Jaccad) (5,222) and deep code search methods (GraphCodeBERT and CodeBERT-bi) (8,071). Besides, different methods can recall a part of unique ground truth code snippets. Best viewed in color.
  • Figure 4: Performance curves with different code volumes. We set the code volume to be from 200 to 40000. TOSS refers to TOSS$_{{[}GraphCodeBERT+BM25{]} + CodeBERT}$. Since we randomly select a specific number of codes from the CSN python code candidates, we repeat each calculation three times and report the average results and the error bounds.
  • Figure 5: Visualization of the speed versus accuracy trade-off of nine baselines and our two-stage method. Dataset: CodeSearchNet python test set. The area of the circle is proportional to the size of the model. The two-stage method TOSS refers to TOSS$_{{[}GraphCodeBERT+BM25{]} + CodeBERT}$. With two-stage method, we are able to achieve top performance comparable to the best single model CodeBERT, while requiring substantially lesser inference time.