Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection
Sagar Srinivas Sakhinana, Geethan Sannidhi, Venkataramana Runkana
TL;DR
The paper tackles the lack of a domain-specific foundational AI for chemical and process engineering calculations and proposes Retrieval-Augmented Instruction-Tuning (RAIT) to adapt open code SLMs with external tools and curated datasets. It introduces an autonomous agent framework that combines RACG with ReAct prompting, a five-stage workflow, program caching, and an attributable reflection mechanism to generate, debug, and optimize executable code from natural language specifications. Central contributions include the MathComp and ChemProc instruction-tuning datasets, a detailed RAIT-based agent architecture, and comprehensive evaluations (including ablations and human assessments) showing competitive performance with large proprietary models. The work highlights improved explainability, knowledge editing capabilities, and cost-efficiency, signaling a practical pathway for deploying specialized AI tools in the chemical process industry.
Abstract
The current technology landscape lacks a foundational AI model for solving process engineering calculations. In this work, we introduce a novel autonomous agent framework leveraging Retrieval-Augmented Instruction-Tuning (RAIT) to enhance open, customizable small code language models (SLMs) for these calculations. By combining instruction tuned code SLMs with Retrieval-Augmented Code Generation (RACG) using external tools, the agent generates, debugs, and optimizes code from natural language specifications. Our approach addresses the limitations of the current lack of a foundational AI model for specialized process engineering tasks and offers benefits of explainability, knowledge editing, and cost-effectiveness. Additionally, we curate custom datasets of chemical and process engineering problems and solutions to overcome data scarcity. Experimental results show that our framework matches the performance of large-scale proprietary models on benchmark datasets, proving its effectiveness and usability.
