SFILES 2.0: An extended text-based flowsheet representation
Gabriel Vogel, Lukas Schulze Balhorn, Edwin Hirtreiter, Artur M. Schweidtmann
TL;DR
This work addresses the need for machine-readable, interoperable representations of chemical process flowsheets beyond static images and PDFs. It introduces SFILES 2.0, an extended text-based notation that captures complex connectivity, multi-stream heat exchangers, top/bottom product branches, and P&ID level control structures, together with standardized unit operation naming. A reversible conversion algorithm (graph invariant computation followed by DFS traversal) and an open-source Python implementation enable bidirectional translation between flowsheet graphs and SFILES 2.0 strings, facilitating FAIR data and scalable database construction. The approach enhances data analysis and AI-enabled processing of flowsheets by providing a standardized, machine-actionable representation and tooling to publish and reuse topology information across research and industry.
Abstract
SFILES is a text-based notation for chemical process flowsheets. It was originally proposed by d'Anterroches (2006) who was inspired by the text-based SMILES notation for molecules. The text-based format has several advantages compared to flowsheet images regarding the storage format, computational accessibility, and eventually for data analysis and processing. However, the original SFILES version cannot describe essential flowsheet configurations unambiguously, such as the distinction between top and bottom products. Neither is it capable of describing the control structure required for the safe and reliable operation of chemical processes. Also, there is no publicly available software for decoding or encoding chemical process topologies to SFILES. We propose the SFILES 2.0 with a complete description of the extended notation and naming conventions. Additionally, we provide open-source software for the automated conversion between flowsheet graphs and SFILES 2.0 strings. This way, we hope to encourage researchers and engineers to publish their flowsheet topologies as SFILES 2.0 strings. The ultimate goal is to set the standards for creating a FAIR database of chemical process flowsheets, which would be of great value for future data analysis and processing.
