Table of Contents
Fetching ...

Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction

Yanbang Sun, Qing Huang, Xiaoxue Ren, Zhenchang Xing, Xiaohong Li, Junjie Wang

TL;DR

This work tackles the challenge of constructing a rich and reliable API Knowledge Graph by introducing the Explore-Construct-Filter framework, which leverages large language models to automate schema design, instance extraction, and reliability filtering. The method combines a fully connected schema generation with schema-guided extraction and a probabilistic filtering stage to maximize KG richness while controlling noise. Empirical results show a 25.2% improvement in F1 over the prior state-of-the-art (EDC) for KG construction, with the exploration module increasing KG richness by $133.6\%$ and the filtering module improving reliability by $26.6\%$, and it demonstrates generalizability across GPT-4o, Llama, and Claude. The framework’s modular design and the use of CoT reasoning, a hybrid AI/non-AI execution strategy, and association-rule filtering collectively advance automated, scalable API KG construction with practical impact on API recommendation, code generation, and misuse detection.

Abstract

The API Knowledge Graph (API KG) is a structured network that models API entities and their relations, providing essential semantic insights for tasks such as API recommendation, code generation, and API misuse detection. However, constructing a knowledge-rich and reliable API KG presents several challenges. Existing schema-based methods rely heavily on manual annotations to design KG schemas, leading to excessive manual overhead. On the other hand, schema-free methods, due to the lack of schema guidance, are prone to introducing noise, reducing the KG's reliability. To address these issues, we propose the Explore-Construct-Filter framework, an automated approach for API KG construction based on large language models (LLMs). This framework consists of three key modules: 1) KG exploration: LLMs simulate the workflow of annotators to automatically design a schema with comprehensive type triples, minimizing human intervention; 2) KG construction: Guided by the schema, LLMs extract instance triples to construct a rich yet unreliable API KG; 3) KG filtering: Removing invalid type triples and suspicious instance triples to construct a rich and reliable API KG. Experimental results demonstrate that our method surpasses the state-of-the-art method, achieving a 25.2% improvement in F1 score. Moreover, the Explore-Construct-Filter framework proves effective, with the KG exploration module increasing KG richness by 133.6% and the KG filtering module improving reliability by 26.6%. Finally, cross-model experiments confirm the generalizability of our framework.

Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction

TL;DR

This work tackles the challenge of constructing a rich and reliable API Knowledge Graph by introducing the Explore-Construct-Filter framework, which leverages large language models to automate schema design, instance extraction, and reliability filtering. The method combines a fully connected schema generation with schema-guided extraction and a probabilistic filtering stage to maximize KG richness while controlling noise. Empirical results show a 25.2% improvement in F1 over the prior state-of-the-art (EDC) for KG construction, with the exploration module increasing KG richness by and the filtering module improving reliability by , and it demonstrates generalizability across GPT-4o, Llama, and Claude. The framework’s modular design and the use of CoT reasoning, a hybrid AI/non-AI execution strategy, and association-rule filtering collectively advance automated, scalable API KG construction with practical impact on API recommendation, code generation, and misuse detection.

Abstract

The API Knowledge Graph (API KG) is a structured network that models API entities and their relations, providing essential semantic insights for tasks such as API recommendation, code generation, and API misuse detection. However, constructing a knowledge-rich and reliable API KG presents several challenges. Existing schema-based methods rely heavily on manual annotations to design KG schemas, leading to excessive manual overhead. On the other hand, schema-free methods, due to the lack of schema guidance, are prone to introducing noise, reducing the KG's reliability. To address these issues, we propose the Explore-Construct-Filter framework, an automated approach for API KG construction based on large language models (LLMs). This framework consists of three key modules: 1) KG exploration: LLMs simulate the workflow of annotators to automatically design a schema with comprehensive type triples, minimizing human intervention; 2) KG construction: Guided by the schema, LLMs extract instance triples to construct a rich yet unreliable API KG; 3) KG filtering: Removing invalid type triples and suspicious instance triples to construct a rich and reliable API KG. Experimental results demonstrate that our method surpasses the state-of-the-art method, achieving a 25.2% improvement in F1 score. Moreover, the Explore-Construct-Filter framework proves effective, with the KG exploration module increasing KG richness by 133.6% and the KG filtering module improving reliability by 26.6%. Finally, cross-model experiments confirm the generalizability of our framework.

Paper Structure

This paper contains 53 sections, 14 figures, 13 tables.

Figures (14)

  • Figure 1: The Comparison of API KG Construction Methods.
  • Figure 2: Overall Framework of Our Method.
  • Figure 3: Workflow of KG Exploration Module.
  • Figure 4: Workflow of KG Construction Module.
  • Figure 5: Workflow of KG Filtering Module.
  • ...and 9 more figures