Table of Contents
Fetching ...

Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning

Loïc Rakotoson, Sylvain Massip, Fréjus A. A. Laleye

TL;DR

The paper addresses transparency, reasoning, and reliability challenges in scientific information retrieval for long documents by proposing a two-block architecture: (1) sparse retrieval using ontology-oriented query expansion to fetch relevant documents, and (2) iterative, hybrid answer generation over long-context chunks with intermediate checkpoints for user inspection. Key contributions include demonstrating that a lighter, ontology-enhanced retrieval pipeline can achieve competitive performance relative to dense-LMM baselines on long-document tasks, and outlining mechanisms to improve transparency through visible reasoning segments. The work also discusses integration with knowledge graphs and plans for user-centered evaluations to assess usability and trust, aiming to deliver an industrializable system suited to scientific and technical domains. Overall, the approach promises scalable, interpretable information retrieval that balances accuracy, cost, and explainability in real-world deployments.

Abstract

Information retrieval is a rapidly evolving field. However it still faces significant limitations in the scientific and industrial vast amounts of information, such as semantic divergence and vocabulary gaps in sparse retrieval, low precision and lack of interpretability in semantic search, or hallucination and outdated information in generative models. In this paper, we introduce a two-block approach to tackle these hurdles for long documents. The first block enhances language understanding in sparse retrieval by query expansion to retrieve relevant documents. The second block deepens the result by providing comprehensive and informative answers to the complex question using only the information spread in the long document, enabling bidirectional engagement. At various stages of the pipeline, intermediate results are presented to users to facilitate understanding of the system's reasoning. We believe this bidirectional approach brings significant advancements in terms of transparency, logical thinking, and comprehensive understanding in the field of scientific information retrieval.

Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning

TL;DR

The paper addresses transparency, reasoning, and reliability challenges in scientific information retrieval for long documents by proposing a two-block architecture: (1) sparse retrieval using ontology-oriented query expansion to fetch relevant documents, and (2) iterative, hybrid answer generation over long-context chunks with intermediate checkpoints for user inspection. Key contributions include demonstrating that a lighter, ontology-enhanced retrieval pipeline can achieve competitive performance relative to dense-LMM baselines on long-document tasks, and outlining mechanisms to improve transparency through visible reasoning segments. The work also discusses integration with knowledge graphs and plans for user-centered evaluations to assess usability and trust, aiming to deliver an industrializable system suited to scientific and technical domains. Overall, the approach promises scalable, interpretable information retrieval that balances accuracy, cost, and explainability in real-world deployments.

Abstract

Information retrieval is a rapidly evolving field. However it still faces significant limitations in the scientific and industrial vast amounts of information, such as semantic divergence and vocabulary gaps in sparse retrieval, low precision and lack of interpretability in semantic search, or hallucination and outdated information in generative models. In this paper, we introduce a two-block approach to tackle these hurdles for long documents. The first block enhances language understanding in sparse retrieval by query expansion to retrieve relevant documents. The second block deepens the result by providing comprehensive and informative answers to the complex question using only the information spread in the long document, enabling bidirectional engagement. At various stages of the pipeline, intermediate results are presented to users to facilitate understanding of the system's reasoning. We believe this bidirectional approach brings significant advancements in terms of transparency, logical thinking, and comprehensive understanding in the field of scientific information retrieval.
Paper Structure (12 sections, 1 equation, 3 figures, 1 table)

This paper contains 12 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Retrieval Augmented Generation Architecture
  • Figure 2: Sparse Retrieval with Ontology-Oriented Query Expansion. Yellow: User access points.
  • Figure 3: Answer Generation with Iterative Deepening. Yellow: User access points.