Table of Contents
Fetching ...

From Keywords to Structured Summaries: Streamlining Scholarly Information Access

Mahsa Shamsabadi, Jennifer D'Souza

TL;DR

This paper highlights the growing importance of information retrieval engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications by using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords.

Abstract

This paper highlights the growing importance of information retrieval (IR) engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications. The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards, to revolutionize how researchers access and filter articles, replacing the traditional text-heavy approach. This vision is exemplified through a proof of concept centered on the "reproductive number estimate of infectious diseases" research theme, using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords. The result is a next-generation information access system as an IR method accessible at https://orkg.org/usecases/r0-estimates.

From Keywords to Structured Summaries: Streamlining Scholarly Information Access

TL;DR

This paper highlights the growing importance of information retrieval engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications by using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords.

Abstract

This paper highlights the growing importance of information retrieval (IR) engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications. The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards, to revolutionize how researchers access and filter articles, replacing the traditional text-heavy approach. This vision is exemplified through a proof of concept centered on the "reproductive number estimate of infectious diseases" research theme, using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords. The result is a next-generation information access system as an IR method accessible at https://orkg.org/usecases/r0-estimates.
Paper Structure (8 sections, 3 figures, 1 table)

This paper contains 8 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: (Left image) A visual analytical dashboard in our https://orkg.org/usecases/r0-estimates provides charts (a), (b), (c), (d) to help researchers make informed article filtering decisions. (Right image) The backend workflow, managed by a web API, handles database interactions for frontend rendering. It incorporates a scheduler for database updates, with LLM queries supplying structured scholarly knowledge before each update.
  • Figure 2: A closer look at chart (a) in \ref{['fig:platform']}. This chart was designed to address the research question: "What are the maximum R0 estimates reported for the diseases?" to support advanced scholarly publication filtering. The y-axis displays max R0 values, while the x-axis shows various infectious diseases in our database. The chart facilitates filtering by allowing selection of the R0 value range to display. Additionally, clicking on each bar reveals the list of the publications whose data underlies the bar, with clickable links redirecting to the respective articles on PubMed.
  • Figure 3: A closer look at chart (b) in \ref{['fig:platform']}. This chart was designed to address the research question: "For a chosen disease, how many studies have been reported across study locations?" The y-axis represents various study locations and the x-axis denotes the number of studies found for that disease in our database. The figure presents the results for disease name Ebola and shows that it was studied in Congo, Guinea, Liberia, Sierra Leono, Uganda, and Zambia. Clicking on each bar reveals the list of the publications whose data underlies the bar, with clickable links redirecting to the respective articles on PubMed.