Table of Contents
Fetching ...

Building an Explainable Graph-based Biomedical Paper Recommendation System (Technical Report)

Hermann Kroll, Christin K. Kreutz, Bill Matthias Thang, Philipp Schaer, Wolf-Tilo Balke

TL;DR

This work tackles scalable, explainable biomedical paper recommendation by introducing XGPRec, a graph-based system that represents each paper as a concept interaction graph and explains recommendations via shared graph patterns. It uses a two-stage retrieval pipeline (FSConcept/FSNode/FSCore for candidate retrieval) and a second stage that blends graph-overlap with BM25 text scoring, with graph explanations highlighted in the UI. On large-scale biomedical corpora (~37M MEDLINE documents) and standard benchmarks (PM2020, Genomics, RELISH), XGPRec achieves comparable or higher recall and precision than PubMed-recommendations in several settings, while offering explainable graph-based justifications and faster first-stage performance. The results support the practical value of graph-based explainability in digital libraries and provide a foundation for further enhancement of explanations and hybrid recommendation strategies, with code made openly available for adoption by other digital libraries.

Abstract

Digital libraries provide different access paths, allowing users to explore their collections. For instance, paper recommendation suggests literature similar to some selected paper. Their implementation is often cost-intensive, especially if neural methods are applied. Additionally, it is hard for users to understand or guess why a recommendation should be relevant for them. That is why we tackled the problem from a different perspective. We propose XGPRec, a graph-based and thus explainable method which we integrate into our existing graph-based biomedical discovery system. Moreover, we show that XGPRec (1) can, in terms of computational costs, manage a real digital library collection with 37M documents from the biomedical domain, (2) performs well on established test collections and concept-centric information needs, and (3) generates explanations that proved to be beneficial in a preliminary user study. We share our code so that user libraries can build upon XGPRec.

Building an Explainable Graph-based Biomedical Paper Recommendation System (Technical Report)

TL;DR

This work tackles scalable, explainable biomedical paper recommendation by introducing XGPRec, a graph-based system that represents each paper as a concept interaction graph and explains recommendations via shared graph patterns. It uses a two-stage retrieval pipeline (FSConcept/FSNode/FSCore for candidate retrieval) and a second stage that blends graph-overlap with BM25 text scoring, with graph explanations highlighted in the UI. On large-scale biomedical corpora (~37M MEDLINE documents) and standard benchmarks (PM2020, Genomics, RELISH), XGPRec achieves comparable or higher recall and precision than PubMed-recommendations in several settings, while offering explainable graph-based justifications and faster first-stage performance. The results support the practical value of graph-based explainability in digital libraries and provide a foundation for further enhancement of explanations and hybrid recommendation strategies, with code made openly available for adoption by other digital libraries.

Abstract

Digital libraries provide different access paths, allowing users to explore their collections. For instance, paper recommendation suggests literature similar to some selected paper. Their implementation is often cost-intensive, especially if neural methods are applied. Additionally, it is hard for users to understand or guess why a recommendation should be relevant for them. That is why we tackled the problem from a different perspective. We propose XGPRec, a graph-based and thus explainable method which we integrate into our existing graph-based biomedical discovery system. Moreover, we show that XGPRec (1) can, in terms of computational costs, manage a real digital library collection with 37M documents from the biomedical domain, (2) performs well on established test collections and concept-centric information needs, and (3) generates explanations that proved to be beneficial in a preliminary user study. We share our code so that user libraries can build upon XGPRec.

Paper Structure

This paper contains 18 sections, 7 equations, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: Systematic overview: Document graph representations are used to compute and explain relevant paper recommendations for users.
  • Figure 2: Screenshot of our prototypical system: The generated explanation why the candidate document should be relevant to the input document is shown. Shared information (nodes and edges) are visualized with colors whereas information that is added by the candidate document is visualized as dashed lines and not colored nodes.
  • Figure 3: Screenshot of our improved document visualization: On the left side, detected concepts are highlighted in the text via a color encoding. On the right side, the essential document graph is shown, i.e., the most relevant extracted statements are shown to the users. Users can on the left side filter for certain concept types (diseases, drugs, etc.) and on the right side select the number of statements that should be shown.