Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents
Yang Liu, Zaid Abulawi, Abhiram Garimidi, Doyeong Lim
TL;DR
This work tackles the scalability challenge of data-driven engineering modeling by introducing LLM-based agents that automate end-to-end regression workflows. It compares a supervisor-based multi-agent system with a single ReAct agent on the OECD/NEA CHF benchmark (~$2.5\times 10^4$ points) and demonstrates predictive accuracy and calibrated uncertainty comparable to a human-expert Bayesian-optimized deep ensemble, while surpassing traditional CHF lookup tables on blind tests. A key finding is that the two architectures offer complementary strengths: robustness and efficiency for the multi-agent approach, and adaptive self-repair for ReAct, suggesting architecture choices should align with operational priorities. The work highlights practical implications for reducing human workload and enabling generalization to other data-rich engineering problems, while outlining paths for improvement via retrieval-augmented knowledge, richer tooling, and tighter integration with simulation codes.
Abstract
Modern engineering increasingly relies on vast datasets generated by experiments and simulations, driving a growing demand for efficient, reliable, and broadly applicable modeling strategies. There is also heightened interest in developing data-driven approaches, particularly neural network models, for effective prediction and analysis of scientific datasets. Traditional data-driven methods frequently involve extensive manual intervention, limiting their ability to scale effectively and generalize to diverse applications. In this study, we propose an innovative pipeline utilizing Large Language Model (LLM) agents to automate data-driven modeling and analysis, with a particular emphasis on regression tasks. We evaluate two LLM-agent frameworks: a multi-agent system featuring specialized collaborative agents, and a single-agent system based on the Reasoning and Acting (ReAct) paradigm. Both frameworks autonomously handle data preprocessing, neural network development, training, hyperparameter optimization, and uncertainty quantification (UQ). We validate our approach using a critical heat flux (CHF) prediction benchmark, involving approximately 25,000 experimental data points from the OECD/NEA benchmark dataset. Results indicate that our LLM-agent-developed model surpasses traditional CHF lookup tables and delivers predictive accuracy and UQ on par with state-of-the-art Bayesian optimized deep neural network models developed by human experts. These outcomes underscore the significant potential of LLM-based agents to automate complex engineering modeling tasks, greatly reducing human workload while meeting or exceeding existing standards of predictive performance.
