Table of Contents
Fetching ...

Compositional API Recommendation for Library-Oriented Code Generation

Zexiong Ma, Shengnan An, Bing Xie, Zeqi Lin

TL;DR

CAPIR (Compositional API Recommendation), which adopts a “divide-and-conquer” strategy to recommend APIs for coarse-grained requirements, is proposed and Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines.

Abstract

Large language models (LLMs) have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.

Compositional API Recommendation for Library-Oriented Code Generation

TL;DR

CAPIR (Compositional API Recommendation), which adopts a “divide-and-conquer” strategy to recommend APIs for coarse-grained requirements, is proposed and Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines.

Abstract

Large language models (LLMs) have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.
Paper Structure (32 sections, 10 equations, 5 figures, 5 tables)

This paper contains 32 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An example of library-oriented code generation with LLMs. The gray parts are the generated codes. (a) Due to the lack of API information, LLM generates code that looks reasonable but cannot be executed correctly. (b) Directly using the requirement to retrieve APIs from document, LLM still generates the wrong code. (c) CAPIR Decomposes the task in to subtasks, retrieves APIs from document for each subtask, makes LLM generate the correct code.
  • Figure 2: Overview of our compositional API recommendation.
  • Figure 3: Recall@k and precision@k of CAPIR with and without key components on Torchdata-AR. Removing any part would lead to a decline in CAPIR's performance.
  • Figure 4: API recommendation results for two cases. (a) CAPIR decomposes the task into the appropriate granularity and effectively recommends the correct APIs from the documentation. (b) CAPIR does not precisely decompose the task into subtasks of optimal granularity, but the decomposition still enhances the effectiveness of API recommendation.
  • Figure 5: Visualization of all the API embeddings of Torchdata. The APIs corresponding to a coarse-grained task are scattered in different locations, indicates the necessity of decomposing tasks into subtasks for retrieval.