Table of Contents
Fetching ...

Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

Mengying Wang, Chenhui Ma, Ao Jiao, Tuo Liang, Pengjun Lu, Shrinidhi Hegde, Yu Yin, Evren Gurkan-Cavusoglu, Yinghui Wu

TL;DR

SerenQA tackles the problem of generating serendipitous insights in scientific knowledge-graph question answering by introducing an information-theoretic serendipity score $RNS(A_e,A_s)$ that combines relative relevance, relative novelty, and relative surprise. It provides an expert-annotated benchmark built on the Clinical Knowledge Graph for drug repurposing and a three-stage evaluation pipeline spanning retrieval, reasoning, and exploratory serendipity search. Experimental results show frontier LLMs excel at retrieval but struggle to produce genuinely serendipitous, valuable connections, indicating opportunities for multi-model and reasoning enhancements. The work delivers publicly available resources to standardize serendipity evaluation in scientific KGQA.

Abstract

Large Language Models (LLMs) have greatly advanced knowledge graph question answering (KGQA), yet existing systems are typically optimized for returning highly relevant but predictable answers. A missing yet desired capacity is to exploit LLMs to suggest surprise and novel ("serendipitious") answers. In this paper, we formally define the serendipity-aware KGQA task and propose the SerenQA framework to evaluate LLMs' ability to uncover unexpected insights in scientific KGQA tasks. SerenQA includes a rigorous serendipity metric based on relevance, novelty, and surprise, along with an expert-annotated benchmark derived from the Clinical Knowledge Graph, focused on drug repurposing. Additionally, it features a structured evaluation pipeline encompassing three subtasks: knowledge retrieval, subgraph reasoning, and serendipity exploration. Our experiments reveal that while state-of-the-art LLMs perform well on retrieval, they still struggle to identify genuinely surprising and valuable discoveries, underscoring a significant room for future improvements. Our curated resources and extended version are released at: https://cwru-db-group.github.io/serenQA.

Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

TL;DR

SerenQA tackles the problem of generating serendipitous insights in scientific knowledge-graph question answering by introducing an information-theoretic serendipity score that combines relative relevance, relative novelty, and relative surprise. It provides an expert-annotated benchmark built on the Clinical Knowledge Graph for drug repurposing and a three-stage evaluation pipeline spanning retrieval, reasoning, and exploratory serendipity search. Experimental results show frontier LLMs excel at retrieval but struggle to produce genuinely serendipitous, valuable connections, indicating opportunities for multi-model and reasoning enhancements. The work delivers publicly available resources to standardize serendipity evaluation in scientific KGQA.

Abstract

Large Language Models (LLMs) have greatly advanced knowledge graph question answering (KGQA), yet existing systems are typically optimized for returning highly relevant but predictable answers. A missing yet desired capacity is to exploit LLMs to suggest surprise and novel ("serendipitious") answers. In this paper, we formally define the serendipity-aware KGQA task and propose the SerenQA framework to evaluate LLMs' ability to uncover unexpected insights in scientific KGQA tasks. SerenQA includes a rigorous serendipity metric based on relevance, novelty, and surprise, along with an expert-annotated benchmark derived from the Clinical Knowledge Graph, focused on drug repurposing. Additionally, it features a structured evaluation pipeline encompassing three subtasks: knowledge retrieval, subgraph reasoning, and serendipity exploration. Our experiments reveal that while state-of-the-art LLMs perform well on retrieval, they still struggle to identify genuinely surprising and valuable discoveries, underscoring a significant room for future improvements. Our curated resources and extended version are released at: https://cwru-db-group.github.io/serenQA.

Paper Structure

This paper contains 37 sections, 16 equations, 7 figures, 4 tables, 3 algorithms.

Figures (7)

  • Figure 1: Suggesting Drugs that treat Severe Acute Pain: A Serendipitous case of Journavx.
  • Figure 2: $$SerenQA Framework. (A): Computing $$RNS score for partition $({\mathcal{A}}_e, {\mathcal{A}}_s)$ form ${\mathcal{G}}$; (Sec. \ref{['sec:quantifiy']}). (B): Constructing $$SerenQA dataset from ClinicalKG; (Sec. \ref{['sec:benchmark']}). (C): For an NL query, our pipeline retrieves ${\mathcal{A}}_e$ from ${\mathcal{G}}$ and explores ${\mathcal{A}}_s$ from ${\mathcal{A}}_e$ with beam search. (Sec. \ref{['sec:pipeline']}).
  • Figure 3: Correlation of Metrics Across Partition Strategies
  • Figure 4: Ontology of biomedical entities and relationships in the Clinical Knowledge Graph (CKG)
  • Figure 5: Model scale vs. Serendipity Exploration Performance
  • ...and 2 more figures