BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data

Bingsen Qiu; Zijian Liu; Xiao Liu; Bingjie Wang; Feier Zhang; Yixuan Qin; Chunyan Li; Haoshen Yang; Zeren Gao

BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data

Bingsen Qiu, Zijian Liu, Xiao Liu, Bingjie Wang, Feier Zhang, Yixuan Qin, Chunyan Li, Haoshen Yang, Zeren Gao

TL;DR

BMGQ presents a scalable bottom-up framework for generating high-difficulty, training-ready multi-hop QA data from semi-structured sources. It builds diverse evidence graphs around seed entities, uses NLI-based relation classification for edge construction, and employs reverse question generation with obfuscation to ensure oblique, uniquely solvable prompts. A two-layer data quality system, combining graph-based structural checks and multi-model validation with explicit evidence verification, ensures precision and uniqueness. Experiments show BMGQ achieves BrowseComp-level complexity while enabling large-scale training data generation, reducing manual curation and supporting training-time fine-tuning and RL for deep reasoning models.

Abstract

Building training-ready multi-hop question answering (QA) datasets that truly stress a model's retrieval and reasoning abilities remains highly challenging recently. While there have been a few recent evaluation datasets that capture the characteristics of hard-to-search but easy-to-verify problems -- requiring the integration of ambiguous, indirect, and cross-domain cues -- these data resources remain scarce and are mostly designed for evaluation, making them unsuitable for supervised fine-tuning (SFT) or reinforcement learning (RL). Meanwhile, manually curating non-trivially retrievable questions -- where answers cannot be found through a single direct query but instead require multi-hop reasoning over oblique and loosely connected evidence -- incurs prohibitive human costs and fails to scale, creating a critical data bottleneck for training high-capability retrieval-and-reasoning agents. To address this, we present BMGQ, a bottom-up automated method for generating high-difficulty, training-ready multi-hop questions from semi-structured knowledge sources. The BMGQ system (i) grows diverse, logically labeled evidence clusters through Natural Language Inference (NLI)-based relation typing and diversity-aware expansion; (ii) applies reverse question construction to compose oblique cues so that isolated signals are underinformative but their combination uniquely identifies the target entity; and (iii) enforces quality with a two-step evaluation pipeline that combines multi-model consensus filtering with structured constraint decomposition and evidence-based matching. The result is a scalable process that yields complex, retrieval-resistant yet verifiable questions suitable for SFT/RL training as well as challenging evaluation, substantially reducing human curation effort while preserving the difficulty profile of strong evaluation benchmarks.

BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data

TL;DR

Abstract

BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)