Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

Hamed Babaei Giglou; Tilahun Abedissa Taffa; Rana Abdullah; Aida Usmanova; Ricardo Usbeck; Jennifer D'Souza; Sören Auer

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

Hamed Babaei Giglou, Tilahun Abedissa Taffa, Rana Abdullah, Aida Usmanova, Ricardo Usbeck, Jennifer D'Souza, Sören Auer

TL;DR

The paper addresses the challenge of retrieving and answering questions over heterogeneous, federated scholarly data sources by deploying a Retrieval Augmented Generation (RAG) based scholarly QA system on top of the NFDI4DataScience Gateway. The approach combines a federated search gateway with an ensemble retriever and an LLM-based answer generator, guided by a structured prompt and conversation memory to produce accurate, context-grounded responses. To evaluate the system, the authors construct AI-QA and Comparison-QA datasets from ORKG comparisons and gateway results, and assess both gateway performance (response time, document retrieval, relevancy) and QA quality (ROUGE, BLEU, BERTScore, Exact Match). Key findings show the gateway can retrieve a substantial number of documents within a few seconds and that the QA component achieves meaningful alignment with ground-truth data, while also highlighting limitations related to data availability and LLM constraints. The work demonstrates a practical path toward integrated, interactive scholarly search and QA across federated sources, with public code available for replication and future expansion toward broader LLM evaluation and data curation efforts.

Abstract

This paper introduces a scholarly Question Answering (QA) system on top of the NFDI4DataScience Gateway, employing a Retrieval Augmented Generation-based (RAG) approach. The NFDI4DS Gateway, as a foundational framework, offers a unified and intuitive interface for querying various scientific databases using federated search. The RAG-based scholarly QA, powered by a Large Language Model (LLM), facilitates dynamic interaction with search results, enhancing filtering capabilities and fostering a conversational engagement with the Gateway search. The effectiveness of both the Gateway and the scholarly QA system is demonstrated through experimental analysis.

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

TL;DR

Abstract

Paper Structure (17 sections, 3 figures, 2 tables)

This paper contains 17 sections, 3 figures, 2 tables.

Introduction
Related Works
Methodological Framework
The Gateway -- Federated Search
Scholarly Question Answering
Evaluation
Evaluation Dataset
Constructing Queries for Assessing the Gateway Performance.
Generating Scholarly QA Datasets.
Evaluation Metrics
Gateway Evaluation Metrics.
Scholarly QA Evaluation Metrics.
Results
Gateway and Scholarly QA Results.
Limitations and Future Directions
...and 2 more sections

Figures (3)

Figure 1: A functional view of the NFDI4DS Gateway architecture with scholarly QA application.
Figure 2: Gateway retrieved documents distribution is presented in the left figure. The x-axis represents the number of retrieved documents, and the y-axis the number of queries. The right figure represents the response time distribution, with the x-axis as a response time in seconds and the y-axis as the number of queries.
Figure 3: Gateway retrieved documents relevancy w.r.t search query analysis using TF-IDF, BM25, and sentence-BERT embeddings for similarity measurement and different thresholds in the range of [0.0, 0.99]

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

TL;DR

Abstract

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

Authors

TL;DR

Abstract

Table of Contents

Figures (3)