AgenticIE: An Adaptive Agent for Information Extraction from Complex Regulatory Documents
Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst
TL;DR
The paper tackles the challenge of extracting structured information from diverse, multilingual DoP regulatory documents by introducing AgenticDoP, an agentic information extraction system built around a planner-executor-responder loop. It infers user intent, detects document modality and language, and dynamically orchestrates a toolbox of extraction and QA capabilities with transparent state tracking. A multilingual dataset of 80 DoP documents with 12 KVP keys and extensive QA annotations demonstrates that the agent outperforms GPT-4o baselines, especially in cross-lingual and nested-structure scenarios, while maintaining 100% JSON validity for KVP tasks. The study also discusses data privacy, reliability, bias, and environmental considerations, and plans to release code and data to support reproducibility. Overall, the work advances auditable, domain-focused information extraction for regulatory documents and informs future integration with compliance workflows and BIM processes.
Abstract
Declaration of Performance (DoP) documents, mandated by EU regulation, certify the performance of construction products. There are two challenges to make DoPs machine and human accessible through automated key-value pair extraction (KVP) and question answering (QA): (1) While some of their content is standardized, DoPs vary widely in layout, schema, and format; (2) Both users and documents are multilingual. Existing static or LLM-only Information Extraction (IE) pipelines fail to adapt to this structural document and user diversity. Our domain-specific, agentic system addresses these challenges through a planner-executor-responder architecture. The system infers user intent, detects document language and modality, and orchestrates tools dynamically for robust, traceable reasoning while avoiding tool misuse or execution loops. Our agent outperforms baselines (ROUGE: 0.783 vs. 0.703/0.608) with better cross-lingual stability (17-point vs. 21-26-point variation).
