Table of Contents
Fetching ...

Neural Code Search Evaluation Dataset

Hongyu Li, Seohyun Kim, Satish Chandra

TL;DR

The paper presents the Neural Code Search Evaluation Dataset, a benchmark combining Stack Overflow Q&A with a large GitHub-derived code corpus to evaluate natural language to code search. It details data collection, the structure of the search corpus at method level, and an evaluation dataset of 287 QA pairs with annotated exemplars. Four code search configurations (NCS, NCSpostrank, UNIF_android, UNIF_stackoverflow) are evaluated using Aroma-based similarity to produce FRank and MRR metrics, with results published in a public CSV. The dataset and results enable reproducible, cross-model benchmarking of code search approaches and are intended to standardize evaluation in this area.

Abstract

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models ([1] and [6]) from recent work. The evaluation dataset is available at https://github.com/facebookresearch/Neural-Code-Search-Evaluation-Dataset

Neural Code Search Evaluation Dataset

TL;DR

The paper presents the Neural Code Search Evaluation Dataset, a benchmark combining Stack Overflow Q&A with a large GitHub-derived code corpus to evaluate natural language to code search. It details data collection, the structure of the search corpus at method level, and an evaluation dataset of 287 QA pairs with annotated exemplars. Four code search configurations (NCS, NCSpostrank, UNIF_android, UNIF_stackoverflow) are evaluated using Aroma-based similarity to produce FRank and MRR metrics, with results published in a public CSV. The dataset and results enable reproducible, cross-model benchmarking of code search approaches and are intended to standardize evaluation in this area.

Abstract

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models ([1] and [6]) from recent work. The evaluation dataset is available at https://github.com/facebookresearch/Neural-Code-Search-Evaluation-Dataset

Paper Structure

This paper contains 8 sections, 1 figure, 1 table.