Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning
Loïc Rakotoson, Sylvain Massip, Fréjus A. A. Laleye
TL;DR
The paper addresses transparency, reasoning, and reliability challenges in scientific information retrieval for long documents by proposing a two-block architecture: (1) sparse retrieval using ontology-oriented query expansion to fetch relevant documents, and (2) iterative, hybrid answer generation over long-context chunks with intermediate checkpoints for user inspection. Key contributions include demonstrating that a lighter, ontology-enhanced retrieval pipeline can achieve competitive performance relative to dense-LMM baselines on long-document tasks, and outlining mechanisms to improve transparency through visible reasoning segments. The work also discusses integration with knowledge graphs and plans for user-centered evaluations to assess usability and trust, aiming to deliver an industrializable system suited to scientific and technical domains. Overall, the approach promises scalable, interpretable information retrieval that balances accuracy, cost, and explainability in real-world deployments.
Abstract
Information retrieval is a rapidly evolving field. However it still faces significant limitations in the scientific and industrial vast amounts of information, such as semantic divergence and vocabulary gaps in sparse retrieval, low precision and lack of interpretability in semantic search, or hallucination and outdated information in generative models. In this paper, we introduce a two-block approach to tackle these hurdles for long documents. The first block enhances language understanding in sparse retrieval by query expansion to retrieve relevant documents. The second block deepens the result by providing comprehensive and informative answers to the complex question using only the information spread in the long document, enabling bidirectional engagement. At various stages of the pipeline, intermediate results are presented to users to facilitate understanding of the system's reasoning. We believe this bidirectional approach brings significant advancements in terms of transparency, logical thinking, and comprehensive understanding in the field of scientific information retrieval.
