Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework
Mengshuo Jia, Zeyu Cui, Gabriela Hug
TL;DR
This work addresses the challenge of enabling LLMs to perform power system simulations by introducing a feedback-driven, multi-agent framework that couples an enhanced retrieval-augmented generation (RAG) module, an advanced reasoning module, and an environmental acting module with an error-feedback loop. The approach decomposes retrieval queries into function- and option-related components, uses a triple-based knowledge base to capture dependencies, and employs few-shot chain-of-thought prompting to guide code generation, all while iteratively executing and correcting simulation tasks. Empirical results on 69 tasks across Daline and MATPOWER show dramatic improvements over baseline LLMs and supervised fine-tuning, achieving up to $96.85\%$ success and enabling cost-effective, rapid simulations (~$0.014$ USD per task). The framework demonstrates strong potential for intelligent, domain-specific LLM assistants in power system research and beyond, while also highlighting areas for future work such as enhanced error checking and interactive clarification to approach 100% reliability.
Abstract
The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems, however, managing simulations -- one of the essential experimental technologies -- remains a challenge for LLMs due to their limited domain-specific knowledge, restricted reasoning capabilities, and imprecise handling of simulation parameters. To address these limitations, this paper proposes a feedback-driven, multi-agent framework. It incorporates three proposed modules: an enhanced retrieval-augmented generation (RAG) module, an improved reasoning module, and a dynamic environmental acting module with an error-feedback mechanism. Validated on 69 diverse tasks from Daline and MATPOWER, this framework achieves success rates of 93.13% and 96.85%, respectively. It significantly outperforms ChatGPT 4o, o1-preview, and the fine-tuned GPT-4o, which all achieved a success rate lower than 30% on complex tasks. Additionally, the proposed framework also supports rapid, cost-effective task execution, completing each simulation in approximately 30 seconds at an average cost of 0.014 USD for tokens. Overall, this adaptable framework lays a foundation for developing intelligent LLM-based assistants for human researchers, facilitating power system research and beyond.
