OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Ningyu Zhang, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, Haofen Wang, Huajun Chen
TL;DR
The paper addresses the need for robust, schema-aligned knowledge extraction across heterogeneous data sources and domains, leveraging LLMs. It introduces OneKE, a dockerized, schema-guided LLM agent framework with multiple specialized agents and a configurable knowledge base that orchestrates extraction from Web HTML and raw PDFs. Key contributions include the multi-agent architecture, the configurable knowledge base for schema configuration and error debugging, and empirical evaluations plus case studies demonstrating adaptability. The work also provides open-source code and a demonstration video, underscoring practical impact for KG construction and cross-domain information extraction. The results suggest improved schema compliance and debugging efficiency, with broad applicability to scientific, news, and other data-intensive tasks.
Abstract
We introduce OneKE, a dockerized schema-guided knowledge extraction system, which can extract knowledge from the Web and raw PDF Books, and support various domains (science, news, etc.). Specifically, we design OneKE with multiple agents and a configure knowledge base. Different agents perform their respective roles, enabling support for various extraction scenarios. The configure knowledge base facilitates schema configuration, error case debugging and correction, further improving the performance. Empirical evaluations on benchmark datasets demonstrate OneKE's efficacy, while case studies further elucidate its adaptability to diverse tasks across multiple domains, highlighting its potential for broad applications. We have open-sourced the Code at https://github.com/zjunlp/OneKE and released a Video at http://oneke.openkg.cn/demo.mp4.
