Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence
Lukas Schulze Balhorn, Kevin Degens, Artur M. Schweidtmann
TL;DR
The paper tackles the manual effort of designing control structures for P&IDs by predicting a control-extended flowsheet from process topologies represented as graphs. It introduces Graph-to-SFILES, a graph-to-sequence approach that encodes flowsheet graphs with a GNN encoder and decodes to $SFILES$ 2.0 strings representing the $CEF$, using beam search to generate top-$k$ predictions. Across 1k–100k synthetic PFD-CEF pairs, the Combined graph encoder yields the strongest performance, achieving up to $46.1\%$ top-1 $CEF$ accuracy on 100k data, with data efficiency evident at small dataset sizes; however, performance gains taper with larger data, and industrial applicability needs validation with real process data. The work highlights the potential of graph-based generative AI for process design while acknowledging limitations due to the absence of process parameters and reliance on synthetic data, and points to future directions in data expansion, physics-infused hybrids, and transfer learning to bridge toward industry practice.
Abstract
Control structure design is an important but tedious step in P&ID development. Generative artificial intelligence (AI) promises to reduce P&ID development time by supporting engineers. Previous research on generative AI in chemical process design mainly represented processes by sequences. However, graphs offer a promising alternative because of their permutation invariance. We propose the Graph-to-SFILES model, a generative AI method to predict control structures from flowsheet topologies. The Graph-to-SFILES model takes the flowsheet topology as a graph input and returns a control-extended flowsheet as a sequence in the SFILES 2.0 notation. We compare four different graph encoder architectures, one of them being a graph neural network (GNN) proposed in this work. The Graph-to-SFILES model achieves a top-5 accuracy of 73.2% when trained on 10,000 flowsheet topologies. In addition, the proposed GNN performs best among the encoder architectures. Compared to a purely sequence-based approach, the Graph-to-SFILES model improves the top-5 accuracy for a relatively small training dataset of 1,000 flowsheets from 0.9% to 28.4%. However, the sequence-based approach performs better on a large-scale dataset of 100,000 flowsheets. These results highlight the potential of graph-based AI models to accelerate P&ID development in small-data regimes but their effectiveness on industry relevant case studies still needs to be investigated.
