Table of Contents
Fetching ...

Ranking Narrative Query Graphs for Biomedical Document Retrieval (Technical Report)

Hermann Kroll, Pascal Sackhoff, Timo Breuer, Ralf Schenkel, Wolf-Tilo Balke

TL;DR

This work addresses the challenge of ranking graph-based narrative queries over biomedical documents, moving beyond exact-match retrieval to unsupervised, graph-structure–driven ranking. It introduces GraphRank, which combines multiple signals—extraction confidence, tf-idf edge scores, concept coverage, and relational similarity—into a unified fragment score and then selects the best fragment per document. It also adds Partial Matches and ontological expansion to improve recall and handle ontology-driven generalization, all without training data. Evaluations on PM2017-2020 and TREC-COVID show recall and precision gains in concept-centric scenarios, with some limitations when queries lack precise domain concepts; the approach integrates directly into existing digital libraries and reduces reliance on supervised learning.

Abstract

Keyword-based searches are today's standard in digital libraries. Yet, complex retrieval scenarios like in scientific knowledge bases, need more sophisticated access paths. Although each document somewhat contributes to a domain's body of knowledge, the exact structure between keywords, i.e., their possible relationships, and the contexts spanned within each single document will be crucial for effective retrieval. Following this logic, individual documents can be seen as small-scale knowledge graphs on which graph queries can provide focused document retrieval. We implemented a full-fledged graph-based discovery system for the biomedical domain and demonstrated its benefits in the past. Unfortunately, graph-based retrieval methods generally follow an 'exact match' paradigm, which severely hampers search efficiency, since exact match results are hard to rank by relevance. This paper extends our existing discovery system and contributes effective graph-based unsupervised ranking methods, a new query relaxation paradigm, and ontological rewriting. These extensions improve the system further so that users can retrieve results with higher precision and higher recall due to partial matching and ontological rewriting.

Ranking Narrative Query Graphs for Biomedical Document Retrieval (Technical Report)

TL;DR

This work addresses the challenge of ranking graph-based narrative queries over biomedical documents, moving beyond exact-match retrieval to unsupervised, graph-structure–driven ranking. It introduces GraphRank, which combines multiple signals—extraction confidence, tf-idf edge scores, concept coverage, and relational similarity—into a unified fragment score and then selects the best fragment per document. It also adds Partial Matches and ontological expansion to improve recall and handle ontology-driven generalization, all without training data. Evaluations on PM2017-2020 and TREC-COVID show recall and precision gains in concept-centric scenarios, with some limitations when queries lack precise domain concepts; the approach integrates directly into existing digital libraries and reduces reliance on supervised learning.

Abstract

Keyword-based searches are today's standard in digital libraries. Yet, complex retrieval scenarios like in scientific knowledge bases, need more sophisticated access paths. Although each document somewhat contributes to a domain's body of knowledge, the exact structure between keywords, i.e., their possible relationships, and the contexts spanned within each single document will be crucial for effective retrieval. Following this logic, individual documents can be seen as small-scale knowledge graphs on which graph queries can provide focused document retrieval. We implemented a full-fledged graph-based discovery system for the biomedical domain and demonstrated its benefits in the past. Unfortunately, graph-based retrieval methods generally follow an 'exact match' paradigm, which severely hampers search efficiency, since exact match results are hard to rank by relevance. This paper extends our existing discovery system and contributes effective graph-based unsupervised ranking methods, a new query relaxation paradigm, and ontological rewriting. These extensions improve the system further so that users can retrieve results with higher precision and higher recall due to partial matching and ontological rewriting.

Paper Structure

This paper contains 26 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Systematic overview: Users formulate their information needs as graph patterns between concepts. Queries are translated and matched against document graphs. Matches are documents that match the query completely (full match) or partially (partial match). The matched documents are then ranked based on their graphs.
  • Figure 2: Conceptual Overview: When ranking document result lists with Full and Partial Matches, the list of Full Matches is always placed at the top of the final result list.
  • Figure 3: Topic-wise Recall@1000 evaluation on PM2020: GraphRank vs. Native BM25.
  • Figure 4: Topic-wise P@20 evaluation on PM2020: GraphRank vs. Native BM25
  • Figure 5: Topic-wise P@20 evaluation on TREC-COVID (Abstracts): GraphRank vs. Native BM25