D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

Weibo Zhou; Lingbo Li; Shangsong Liang

D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

Weibo Zhou, Lingbo Li, Shangsong Liang

TL;DR

D-SCoRE introduces a training-free, end-to-end pipeline that generates reasoning-rich QA-CoT data from arbitrary texts through document-centric segmentation, explicit/implicit question design, and counterfactual augmentation. By coupling multi-stage quality control with reasoning-centric supervision, it achieves high data quality and diversity while enabling domain adaptation on consumer hardware, outperforming models trained on human-annotated data in many settings. The framework demonstrates strong data efficiency, with substantial gains from implicit reasoning content and robust benefits from heterogeneous quality control across model scales and domains. These findings suggest a scalable path to domain-specific QA SFT, reducing annotation costs while improving multi-step reasoning capabilities in LLMs.

Abstract

The scarcity and high cost of high-quality domain-specific question-answering (QA) datasets limit supervised fine-tuning of large language models (LLMs). We introduce $\textbf{D-SCoRE}$, a training-free framework that leverages LLMs and prompt engineering to automatically generate diverse, rich QA datasets with Chain-of-Thought (CoT) from arbitrary textual sources. By integrating $\textbf{D}$ocument-centric processing, $\textbf{S}$egmentation, $\textbf{Co}$T $\textbf{R}$easoning, and structured $\textbf{E}$xport - along with multi-dimensional controls such as semantic role transformation, question type balancing, and counterfactual augmentation - D-SCoRE produces tailored QA pairs with enhanced diversity and relevance. LLMs fine-tuned on D-SCoRE-generated datasets outperform those trained on human-annotated QA data across most evaluated domains. Its efficiency and scalability enable rapid, high-performance domain-adaptive fine-tuning on consumer-grade hardware, generating over 1,100 high-quality QA pairs per GPU-hour end-to-end.

D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

TL;DR

Abstract

The scarcity and high cost of high-quality domain-specific question-answering (QA) datasets limit supervised fine-tuning of large language models (LLMs). We introduce

, a training-free framework that leverages LLMs and prompt engineering to automatically generate diverse, rich QA datasets with Chain-of-Thought (CoT) from arbitrary textual sources. By integrating

ocument-centric processing,

egmentation,

easoning, and structured

xport - along with multi-dimensional controls such as semantic role transformation, question type balancing, and counterfactual augmentation - D-SCoRE produces tailored QA pairs with enhanced diversity and relevance. LLMs fine-tuned on D-SCoRE-generated datasets outperform those trained on human-annotated QA data across most evaluated domains. Its efficiency and scalability enable rapid, high-performance domain-adaptive fine-tuning on consumer-grade hardware, generating over 1,100 high-quality QA pairs per GPU-hour end-to-end.

D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

TL;DR

Abstract

D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)