ESGBench: A Benchmark for Explainable ESG Question Answering in Corporate Sustainability Reports
Sherine George, Nithish Saji
TL;DR
ESGBench addresses the challenge of explainable QA over ESG disclosures by providing a reproducible pipeline that ingests ESG/TCFD PDFs, builds a chunked and table-aware index, and generates QA pairs with verbatim evidence. It offers an evaluation suite with EM, F1, Numeric Accuracy, Recall@K, and per-category scores, plus a simple RAG baseline to highlight current limitations in numeric KPI grounding and table grounding. The dataset comprises 119 QA pairs from 10 companies, with 40–50% table-derived content, illustrating the need for robust numeric and table reasoning. By promoting evidence-grounded answers and recall-aware evaluation, ESGBench aims to accelerate transparent, standards-aligned ESG AI research and practical deployment, including multilingual and governance considerations.”
Abstract
We present ESGBench, a benchmark dataset and evaluation framework designed to assess explainable ESG question answering systems using corporate sustainability reports. The benchmark consists of domain-grounded questions across multiple ESG themes, paired with human-curated answers and supporting evidence to enable fine-grained evaluation of model reasoning. We analyze the performance of state-of-the-art LLMs on ESGBench, highlighting key challenges in factual consistency, traceability, and domain alignment. ESGBench aims to accelerate research in transparent and accountable ESG-focused AI systems.
