Learning from flowsheets: A generative transformer model for autocompletion of flowsheets
Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann
TL;DR
This work tackles autocompletion of chemical process flowsheets by treating flowsheets as SFILES 2.0 strings and training a decoder-only transformer to learn their grammar. A two-stage training regime uses synthetic data for pre-training and real flowsheet topologies for fine-tuning, with a tailored SFILES 2.0 tokenizer and GPT-2–style architecture. Decoding strategies such as beam search and $top$-$p$ sampling are evaluated, with perplexities showing strong gains on real data after fine-tuning (e.g., PP_te from 25.91 to 4.75). The results demonstrate potential for AI-assisted interactive flowsheet synthesis and motivate integration into process simulation tools for design guidance.
Abstract
We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheets to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis.
