Emission-GPT: A domain-specific language model agent for knowledge retrieval, emission inventory and data analysis
Jiashu Ye, Tong Wu, Weiwen Chen, Hao Zhang, Zeteng Lin, Xingxing Li, Shujuan Weng, Manni Zhu, Xin Yuan, Xinlong Hong, Jingjie Li, Junyu Zheng, Zhijiong Huang, Jing Tang
TL;DR
Emission-GPT presents a domain-specific LLM agent built to retrieve, organize, and analyze atmospheric-emissions information by integrating retrieval-augmented generation, structured prompting, and function calling. The system rests on a curated knowledge base of over 10,000 documents, processed into millions of context chunks to support accurate inventory methods, emission-factor recommendations, and data-driven analyses. Evaluation combines automated, expert, and case-study assessments, showing high factual alignment and actionable outputs while highlighting challenges in multilingual context and retrieval precision. The work demonstrates a practical, extensible platform that lowers entry barriers for emission inventories and scenario-based analyses, with potential for real-time decision support and policy assessment.
Abstract
Improving air quality and addressing climate change relies on accurate understanding and analysis of air pollutant and greenhouse gas emissions. However, emission-related knowledge is often fragmented and highly specialized, while existing methods for accessing and compiling emissions data remain inefficient. These issues hinder the ability of non-experts to interpret emissions information, posing challenges to research and management. To address this, we present Emission-GPT, a knowledge-enhanced large language model agent tailored for the atmospheric emissions domain. Built on a curated knowledge base of over 10,000 documents (including standards, reports, guidebooks, and peer-reviewed literature), Emission-GPT integrates prompt engineering and question completion to support accurate domain-specific question answering. Emission-GPT also enables users to interactively analyze emissions data via natural language, such as querying and visualizing inventories, analyzing source contributions, and recommending emission factors for user-defined scenarios. A case study in Guangdong Province demonstrates that Emission-GPT can extract key insights--such as point source distributions and sectoral trends--directly from raw data with simple prompts. Its modular and extensible architecture facilitates automation of traditionally manual workflows, positioning Emission-GPT as a foundational tool for next-generation emission inventory development and scenario-based assessment.
