Table of Contents
Fetching ...

Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers

Miloš Košprdić, Adela Ljajić, Bojana Bašaragin, Darija Medvecki, Nikola Milošević

TL;DR

Verif.ai tackles the problem of hallucinations in scientific generative QA by combining a hybrid semantic-lexical retrieval system over PubMed with a retrieval-augmented generator (Mistral 7B) that cites sources, plus a SciFact-based verification engine (DeBERTa-large) to assess claim validity. The approach yields an open-source, reference-rich QA workflow with user feedback loops to further refine generation and verification, and aims to build trust in AI-assisted scientific workflows. Preliminary evaluations across retrieval, generation, and verification show promising results and clear directions for improvement, including active learning and explainable verification. The work underscores the practical impact of trustworthy, open-source scientific QA tools that can extend beyond PubMed and adapt to evolving open LLM ecosystems.

Abstract

In this paper, we present the current progress of the project Verif.ai, an open-source scientific generative question-answering system with referenced and verified answers. The components of the system are (1) an information retrieval system combining semantic and lexical search techniques over scientific papers (PubMed), (2) a fine-tuned generative model (Mistral 7B) taking top answers and generating answers with references to the papers from which the claim was derived, and (3) a verification engine that cross-checks the generated claim and the abstract or paper from which the claim was derived, verifying whether there may have been any hallucinations in generating the claim. We are reinforcing the generative model by providing the abstract in context, but in addition, an independent set of methods and models are verifying the answer and checking for hallucinations. Therefore, we believe that by using our method, we can make scientists more productive, while building trust in the use of generative language models in scientific environments, where hallucinations and misinformation cannot be tolerated.

Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers

TL;DR

Verif.ai tackles the problem of hallucinations in scientific generative QA by combining a hybrid semantic-lexical retrieval system over PubMed with a retrieval-augmented generator (Mistral 7B) that cites sources, plus a SciFact-based verification engine (DeBERTa-large) to assess claim validity. The approach yields an open-source, reference-rich QA workflow with user feedback loops to further refine generation and verification, and aims to build trust in AI-assisted scientific workflows. Preliminary evaluations across retrieval, generation, and verification show promising results and clear directions for improvement, including active learning and explainable verification. The work underscores the practical impact of trustworthy, open-source scientific QA tools that can extend beyond PubMed and adapt to evolving open LLM ecosystems.

Abstract

In this paper, we present the current progress of the project Verif.ai, an open-source scientific generative question-answering system with referenced and verified answers. The components of the system are (1) an information retrieval system combining semantic and lexical search techniques over scientific papers (PubMed), (2) a fine-tuned generative model (Mistral 7B) taking top answers and generating answers with references to the papers from which the claim was derived, and (3) a verification engine that cross-checks the generated claim and the abstract or paper from which the claim was derived, verifying whether there may have been any hallucinations in generating the claim. We are reinforcing the generative model by providing the abstract in context, but in addition, an independent set of methods and models are verifying the answer and checking for hallucinations. Therefore, we believe that by using our method, we can make scientists more productive, while building trust in the use of generative language models in scientific environments, where hallucinations and misinformation cannot be tolerated.
Paper Structure (12 sections, 2 figures, 1 table)

This paper contains 12 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Methodology overview of the Verif.ai project
  • Figure 2: Evaluation loss for fine-tuning of Mistral 7B model on PubMedQA questions with generated and referenced answers