MuISQA: Multi-Intent Retrieval-Augmented Generation for Scientific Question Answering
Zhiyuan Li, Haisheng Yu, Guangchuan Guo, Nan Zhou, Jiajun Zhang
TL;DR
MuISQA tackles multi-intent scientific question answering by introducing a dedicated benchmark and an intent-aware retrieval framework. The method uses Hypothetical Query Generation to decompose LLM-hypothesized answers into intent-specific queries and applies Reciprocal Rank Fusion to fuse evidence from diverse passages, achieving improved coverage and accuracy across MuISQA and general RAG benchmarks. The evaluation reveals that explicit intent diversification increases information coverage ($\text{IRR}$) and reduces retrieval redundancy, with notable gains even when hypothetical generation is imperfect. The work provides a practical approach to building more reliable RAG systems for complex scientific QA and offers a fine-grained diagnostic suite for future improvements.
Abstract
Complex scientific questions often entail multiple intents, such as identifying gene mutations and linking them to related diseases. These tasks require evidence from diverse sources and multi-hop reasoning, while conventional retrieval-augmented generation (RAG) systems are usually single-intent oriented, leading to incomplete evidence coverage. To assess this limitation, we introduce the Multi-Intent Scientific Question Answering (MuISQA) benchmark, which is designed to evaluate RAG systems on heterogeneous evidence coverage across sub-questions. In addition, we propose an intent-aware retrieval framework that leverages large language models (LLMs) to hypothesize potential answers, decompose them into intent-specific queries, and retrieve supporting passages for each underlying intent. The retrieved fragments are then aggregated and re-ranked via Reciprocal Rank Fusion (RRF) to balance coverage across diverse intents while reducing redundancy. Experiments on both MuISQA benchmark and other general RAG datasets demonstrate that our method consistently outperforms conventional approaches, particularly in retrieval accuracy and evidence coverage.
