Table of Contents
Fetching ...

Automatic Inter-document Multi-hop Scientific QA Generation

Seungmin Lee, Dongha Kim, Yuni Jeon, Junyoung Koh, Min Song

Abstract

Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts single-hop QAs using large language models (LLMs) with machine reading comprehension and constructs cross-document relations based on embedding-based semantic alignment while selectively leveraging citation information. Applied to 8,211 PubMed Central papers, it produced 411,409 single-hop and 13,672 multi-hop QAs, forming the IM-SciQA dataset. Human and automatic validation confirmed high factual consistency, and experimental results demonstrate that IM-SciQA effectively differentiates reasoning capabilities across retrieval and QA stages, providing a realistic and interpretable benchmark for retrieval-augmented scientific reasoning. We further extend this framework to construct CIM-SciQA, a citation-guided variant achieving comparable performance to the Oracle setting, reinforcing the dataset's validity and generality.

Automatic Inter-document Multi-hop Scientific QA Generation

Abstract

Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts single-hop QAs using large language models (LLMs) with machine reading comprehension and constructs cross-document relations based on embedding-based semantic alignment while selectively leveraging citation information. Applied to 8,211 PubMed Central papers, it produced 411,409 single-hop and 13,672 multi-hop QAs, forming the IM-SciQA dataset. Human and automatic validation confirmed high factual consistency, and experimental results demonstrate that IM-SciQA effectively differentiates reasoning capabilities across retrieval and QA stages, providing a realistic and interpretable benchmark for retrieval-augmented scientific reasoning. We further extend this framework to construct CIM-SciQA, a citation-guided variant achieving comparable performance to the Oracle setting, reinforcing the dataset's validity and generality.
Paper Structure (43 sections, 5 equations, 5 figures, 8 tables)

This paper contains 43 sections, 5 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Example of Inter-document Multi-hop QA: ① the model retrieves relevant documents and ② integrates complementary evidence to answer.
  • Figure 2: Illustration of the relation construction process in AIM-SciQA. For IM-SciQA, candidate relations are created from papers sharing overlapping keywords with the source paper (implicit semantic similarity relationships), while CIM-SciQA constructs relations directly from explicit citation links (explicit citation relationships). From these candidate relations, QA-based relation construction (Section \ref{['sec:document_relation']}) process selects promising single-hop QA (made by Section \ref{['sec:shqg']}) pairs across papers to form inter-document QA candidates, and a paper cluster is built to define the target paper and its corresponding retrieval QA.
  • Figure 3: Average Single-hop QA Count by Section
  • Figure 4: Total Single-hop QA Count by Section
  • Figure 5: Results of multi-dimensional MHQA quality validation