SustainableQA: A Comprehensive Question Answering Dataset for Corporate Sustainability and EU Taxonomy Reporting
Mohammed Ali, Abdelrahman Abdallah, Adam Jatowt
TL;DR
SustainableQA tackles the urgent need for high-quality, domain-specific QA data to support retrieval-augmented systems for corporate sustainability and EU Taxonomy reporting. It introduces a scalable pipeline that fuses semantic passage classification, a hybrid span extraction workflow, and a table-to-paragraph transformation to generate over 195k QA pairs from 61 real-world reports, followed by an automated faithfulness-and-relevance refinement process. Empirical results show that a compact 8B parameter model, fine-tuned on SustainableQA, can outperform larger state-of-the-art models under various prompting strategies, and that the dataset provides strong utility for RAG benchmarks and domain-specific QA tasks. The work advances reproducible evaluation for regulation-aware QA in sustainability contexts and highlights avenues for multimodal extension and broader regulatory applicability.
Abstract
The growing demand for corporate sustainability transparency, particularly under new regulations like the EU Taxonomy, necessitates precise data extraction from large, unstructured corporate reports, a task for which Large Language Models and Retrieval-RAG systems require high-quality, domain-specific question-answering datasets. To address this, we introduce SustainableQA, a novel dataset and a scalable pipeline that generates comprehensive QA pairs from corporate sustainability and annual reports by integrating semantic chunk classification, a hybrid span extraction pipeline, and a specialized table-to-paragraph transformation. To ensure high quality, the generation is followed by a novel automated assessment and refinement pipeline that systematically validates each QA pair for faithfulness and relevance, repairing or discarding low-quality entries. This results in a final, robust dataset of over 195,000 diverse factoid and non-factoid QA pairs, whose effectiveness is demonstrated by initial fine-tuning experiments where a compact 8B parameter model significantly outperforms much larger state-of-the-art models. SustainableQA proves to be a highly effective resource for developing and benchmarking advanced knowledge assistants capable of navigating complex sustainability compliance data.
