Table of Contents
Fetching ...

Knowledge Graph Enrichment and Reasoning for Nobel Laureates

Thanh-Lam T. Nguyen, Ngoc-Quang Le, Thu-Trang Pham, Mai-Vu Tran

TL;DR

The paper presents an end-to-end pipeline to construct and analyze a Nobel Prize knowledge graph by enriching Wikipedia biographies with NER/RE derived from LLMs. It expands the graph with Notable_Work, Event, and Location entities, and applies social network analyses to reveal small-world characteristics and influential hubs. A GraphRAG-based chatbot with a fine-tuned Text2Cypher component enables natural-language querying and multi-hop reasoning over the KG. The work includes extensive experiments, a large evaluation dataset, and released data/code, highlighting the value of integrating LLM-driven extraction with graph-based reasoning for domain-specific knowledge discovery.

Abstract

This project aims to construct and analyze a comprehensive knowledge graph of Nobel Prize and Laureates by enriching existing datasets with biographical information extracted from Wikipedia. Our approach integrates multiple advanced techniques, consisting of automatic data augmentation using LLMs for Named Entity Recognition (NER) and Relation Extraction (RE) tasks, and social network analysis to uncover hidden patterns within the scientific community. Furthermore, we also develop a GraphRAG-based chatbot system utilizing a fine-tuned model for Text2Cypher translation, enabling natural language querying over the knowledge graph. Experimental results demonstrate that the enriched graph possesses small-world network properties, identifying key influential figures and central organizations. The chatbot system achieves a competitive accuracy on a custom multiple-choice evaluation dataset, proving the effectiveness of combining LLMs with structured knowledge bases for complex reasoning tasks. Data and source code are available at: https://github.com/tlam25/network-of-awards-and-winners.

Knowledge Graph Enrichment and Reasoning for Nobel Laureates

TL;DR

The paper presents an end-to-end pipeline to construct and analyze a Nobel Prize knowledge graph by enriching Wikipedia biographies with NER/RE derived from LLMs. It expands the graph with Notable_Work, Event, and Location entities, and applies social network analyses to reveal small-world characteristics and influential hubs. A GraphRAG-based chatbot with a fine-tuned Text2Cypher component enables natural-language querying and multi-hop reasoning over the KG. The work includes extensive experiments, a large evaluation dataset, and released data/code, highlighting the value of integrating LLM-driven extraction with graph-based reasoning for domain-specific knowledge discovery.

Abstract

This project aims to construct and analyze a comprehensive knowledge graph of Nobel Prize and Laureates by enriching existing datasets with biographical information extracted from Wikipedia. Our approach integrates multiple advanced techniques, consisting of automatic data augmentation using LLMs for Named Entity Recognition (NER) and Relation Extraction (RE) tasks, and social network analysis to uncover hidden patterns within the scientific community. Furthermore, we also develop a GraphRAG-based chatbot system utilizing a fine-tuned model for Text2Cypher translation, enabling natural language querying over the knowledge graph. Experimental results demonstrate that the enriched graph possesses small-world network properties, identifying key influential figures and central organizations. The chatbot system achieves a competitive accuracy on a custom multiple-choice evaluation dataset, proving the effectiveness of combining LLMs with structured knowledge bases for complex reasoning tasks. Data and source code are available at: https://github.com/tlam25/network-of-awards-and-winners.

Paper Structure

This paper contains 44 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The proposed data construction and enrichment pipeline.
  • Figure 2: Distribution of Multiple-Choice Evaluation Dataset by Hops
  • Figure 3: Chatbot Pipeline.
  • Figure 4: Distribution of Fine-tuning Dataset by Hops
  • Figure 5: Small-World Metrics Comparison (Network vs Random Baseline)
  • ...and 2 more figures