Table of Contents
Fetching ...

UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction

Yansong Ning, Hao Liu

TL;DR

UrbanKGent introduces a unified LLM-agent framework for automatic UrbanKGC, addressing heterogeneity in urban data and geospatial reasoning through knowledgeable instruction generation and tool-augmented trajectory refinement. It builds an UrbanKGent family (7B/8B/13B) via hybrid instruction tuning on Llama 2/3 models, achieving state-of-the-art performance on Relational Triplet Extraction and Knowledge Graph Completion across NYC and Chicago with significant data efficiency and cost savings. Extensive experiments, including human and GPT-4 self-evaluation, show UrbanKGent outperforms 31 baselines and enables construction of UrbanKGs with hundreds of times richer relations using only one-fifth of the data. The work releases an open-source UrbanKGent platform to foster urban knowledge graph research and practical smart city applications.

Abstract

Urban knowledge graph has recently worked as an emerging building block to distill critical knowledge from multi-sourced urban data for diverse urban application scenarios. Despite its promising benefits, urban knowledge graph construction (UrbanKGC) still heavily relies on manual effort, hindering its potential advancement. This paper presents UrbanKGent, a unified large language model agent framework, for urban knowledge graph construction. Specifically, we first construct the knowledgeable instruction set for UrbanKGC tasks (such as relational triplet extraction and knowledge graph completion) via heterogeneity-aware and geospatial-infused instruction generation. Moreover, we propose a tool-augmented iterative trajectory refinement module to enhance and refine the trajectories distilled from GPT-4. Through hybrid instruction fine-tuning with augmented trajectories on Llama 2 and Llama 3 family, we obtain UrbanKGC agent family, consisting of UrbanKGent-7/8/13B version. We perform a comprehensive evaluation on two real-world datasets using both human and GPT-4 self-evaluation. The experimental results demonstrate that UrbanKGent family can not only significantly outperform 31 baselines in UrbanKGC tasks, but also surpass the state-of-the-art LLM, GPT-4, by more than 10% with approximately 20 times lower cost. Compared with the existing benchmark, the UrbanKGent family could help construct an UrbanKG with hundreds of times richer relationships using only one-fifth of the data. Our data and code are available at https://github.com/usail-hkust/UrbanKGent.

UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction

TL;DR

UrbanKGent introduces a unified LLM-agent framework for automatic UrbanKGC, addressing heterogeneity in urban data and geospatial reasoning through knowledgeable instruction generation and tool-augmented trajectory refinement. It builds an UrbanKGent family (7B/8B/13B) via hybrid instruction tuning on Llama 2/3 models, achieving state-of-the-art performance on Relational Triplet Extraction and Knowledge Graph Completion across NYC and Chicago with significant data efficiency and cost savings. Extensive experiments, including human and GPT-4 self-evaluation, show UrbanKGent outperforms 31 baselines and enables construction of UrbanKGs with hundreds of times richer relations using only one-fifth of the data. The work releases an open-source UrbanKGent platform to foster urban knowledge graph research and practical smart city applications.

Abstract

Urban knowledge graph has recently worked as an emerging building block to distill critical knowledge from multi-sourced urban data for diverse urban application scenarios. Despite its promising benefits, urban knowledge graph construction (UrbanKGC) still heavily relies on manual effort, hindering its potential advancement. This paper presents UrbanKGent, a unified large language model agent framework, for urban knowledge graph construction. Specifically, we first construct the knowledgeable instruction set for UrbanKGC tasks (such as relational triplet extraction and knowledge graph completion) via heterogeneity-aware and geospatial-infused instruction generation. Moreover, we propose a tool-augmented iterative trajectory refinement module to enhance and refine the trajectories distilled from GPT-4. Through hybrid instruction fine-tuning with augmented trajectories on Llama 2 and Llama 3 family, we obtain UrbanKGC agent family, consisting of UrbanKGent-7/8/13B version. We perform a comprehensive evaluation on two real-world datasets using both human and GPT-4 self-evaluation. The experimental results demonstrate that UrbanKGent family can not only significantly outperform 31 baselines in UrbanKGC tasks, but also surpass the state-of-the-art LLM, GPT-4, by more than 10% with approximately 20 times lower cost. Compared with the existing benchmark, the UrbanKGent family could help construct an UrbanKG with hundreds of times richer relationships using only one-fifth of the data. Our data and code are available at https://github.com/usail-hkust/UrbanKGent.
Paper Structure (42 sections, 1 equation, 13 figures, 13 tables)

This paper contains 42 sections, 1 equation, 13 figures, 13 tables.

Figures (13)

  • Figure 1: Illustrative example of urban relational triplet extraction and knowledge graph completion. (a) The heterogeneous relationship understanding limitation of LLMs can be addressed by injecting prior urban knowledge into instruction. (b) The geospatial computing limitation of LLMs can be alleviated by invoking external geospatial tools.
  • Figure 1: The statistics of raw datasets.
  • Figure 2: The framework of UrbanKGent.
  • Figure 3: Quantitative performance analysis of prompting GPT-4 for UrbanKGC tasks. The result is obtained by comparing 50 GPT-4's outputs with the human's annotation.
  • Figure 4: An overview of UrbanKGent Construction.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Definition 1