Table of Contents
Fetching ...

Text2Model: Generating dynamic chemical reactor models using large language models (LLMs)

Sophia Rupprecht, Yassine Hounat, Monisha Kumar, Giacomo Lastrucci, Artur M. Schweidtmann

TL;DR

The paper demonstrates that task-specific fine-tuning of an open-source LLM (LoRA on Llama 3.1 8B Instruct) can significantly enhance the syntactic and semantic correctness of Modelica code generated from natural-language descriptions of chemical reactors, outperforming the baseline but lagging behind GPT4o for unseen scenarios. It introduces a synthetic, template-based dataset spanning ODE and DAE formulations and various operation modes, and benchmarks performance with a detailed eight-category error taxonomy. While improvements are clear on training-like cases, generalization to novel reactor configurations is limited, particularly for DAE systems, suggesting the need for broader data, retrieval-augmented approaches, and integration with external tools (e.g., Dymola) for iterative refinement. The work highlights both the promise and current constraints of deploying LLMs for domain-specific dynamic-model generation in chemical engineering.

Abstract

As large language models have shown remarkable capabilities in conversing via natural language, the question arises as to how LLMs could potentially assist chemical engineers in research and industry with domain-specific tasks. We generate dynamic chemical reactor models in Modelica code format from textual descriptions as user input. We fine-tune Llama 3.1 8B Instruct on synthetically generated Modelica code for different reactor scenarios. We compare the performance of our fine-tuned model to the baseline Llama 3.1 8B Instruct model and GPT4o. We manually assess the models' predictions regarding the syntactic and semantic accuracy of the generated dynamic models. We find that considerable improvements are achieved by the fine-tuned model with respect to both the semantic and the syntactic accuracy of the Modelica models. However, the fine-tuned model lacks a satisfactory ability to generalize to unseen scenarios compared to GPT4o.

Text2Model: Generating dynamic chemical reactor models using large language models (LLMs)

TL;DR

The paper demonstrates that task-specific fine-tuning of an open-source LLM (LoRA on Llama 3.1 8B Instruct) can significantly enhance the syntactic and semantic correctness of Modelica code generated from natural-language descriptions of chemical reactors, outperforming the baseline but lagging behind GPT4o for unseen scenarios. It introduces a synthetic, template-based dataset spanning ODE and DAE formulations and various operation modes, and benchmarks performance with a detailed eight-category error taxonomy. While improvements are clear on training-like cases, generalization to novel reactor configurations is limited, particularly for DAE systems, suggesting the need for broader data, retrieval-augmented approaches, and integration with external tools (e.g., Dymola) for iterative refinement. The work highlights both the promise and current constraints of deploying LLMs for domain-specific dynamic-model generation in chemical engineering.

Abstract

As large language models have shown remarkable capabilities in conversing via natural language, the question arises as to how LLMs could potentially assist chemical engineers in research and industry with domain-specific tasks. We generate dynamic chemical reactor models in Modelica code format from textual descriptions as user input. We fine-tune Llama 3.1 8B Instruct on synthetically generated Modelica code for different reactor scenarios. We compare the performance of our fine-tuned model to the baseline Llama 3.1 8B Instruct model and GPT4o. We manually assess the models' predictions regarding the syntactic and semantic accuracy of the generated dynamic models. We find that considerable improvements are achieved by the fine-tuned model with respect to both the semantic and the syntactic accuracy of the Modelica models. However, the fine-tuned model lacks a satisfactory ability to generalize to unseen scenarios compared to GPT4o.

Paper Structure

This paper contains 10 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: System message (top left), example user input (bottom left), and corresponding exemplary response (right). The example input and output are shortened for simplification. The output contains examples of seven error cases that are considered in this study.