Table of Contents
Fetching ...

Learning from flowsheets: A generative transformer model for autocompletion of flowsheets

Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann

TL;DR

This work tackles autocompletion of chemical process flowsheets by treating flowsheets as SFILES 2.0 strings and training a decoder-only transformer to learn their grammar. A two-stage training regime uses synthetic data for pre-training and real flowsheet topologies for fine-tuning, with a tailored SFILES 2.0 tokenizer and GPT-2–style architecture. Decoding strategies such as beam search and $top$-$p$ sampling are evaluated, with perplexities showing strong gains on real data after fine-tuning (e.g., PP_te from 25.91 to 4.75). The results demonstrate potential for AI-assisted interactive flowsheet synthesis and motivate integration into process simulation tools for design guidance.

Abstract

We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheets to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis.

Learning from flowsheets: A generative transformer model for autocompletion of flowsheets

TL;DR

This work tackles autocompletion of chemical process flowsheets by treating flowsheets as SFILES 2.0 strings and training a decoder-only transformer to learn their grammar. A two-stage training regime uses synthetic data for pre-training and real flowsheet topologies for fine-tuning, with a tailored SFILES 2.0 tokenizer and GPT-2–style architecture. Decoding strategies such as beam search and - sampling are evaluated, with perplexities showing strong gains on real data after fine-tuning (e.g., PP_te from 25.91 to 4.75). The results demonstrate potential for AI-assisted interactive flowsheet synthesis and motivate integration into process simulation tools for design guidance.

Abstract

We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheets to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis.
Paper Structure (20 sections, 4 equations, 11 figures, 2 tables)

This paper contains 20 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Simplified illustration of transformer architecture derived from Vaswani2017 consisting of encoder and decoder stack.
  • Figure 2: Simplified illustration of a decoder-only architecture for auto-regressive text-generation
  • Figure 3: Simple chemical process flowsheet with branchings, recycle stream, and different mass trains
  • Figure 4: Graph representation of flowsheet in Figure \ref{['fig:Flowsheet_intro']}
  • Figure 5: Overview of flowsheet completion with the Generative Flowsheet Transformer (a) Incomplete flowsheet graph is converted to incomplete SFILES 2.0 string (1). Auto-regressive transformer model completes string (2,3,4,5). autocompleted SFILES 2.0 string is converted to completed flowsheet graph. (b) Example autocompletion of a flowsheet
  • ...and 6 more figures