Talking like Piping and Instrumentation Diagrams (P&IDs)
Achmad Anggawirya Alimin, Dominik P. Goldstein, Lukas Schulze Balhorn, Artur M. Schweidtmann
TL;DR
This work addresses the challenge of interacting with P&IDs through natural language by encoding P&IDs with the DEXPI data model and linking them to LLMs via graph-RAG. It presents a three-part pipeline: (i) converting DEXPI P&IDs into a labeled property graph using pyDEXPI, (ii) generating a readable Neo4j LPG knowledge graph with rich component labels and properties, and (iii) retrieving information through a graph-aware LLM interface that uses high-level graph context to reduce token load and improve accuracy. A high-level condensation approach dramatically reduces context size (212 nodes/405 relations and ~67,000 tokens to 53 nodes/57 relations and ~9,000 tokens) while preserving essential structure. Evaluation across different LLMs shows larger models perform better, with high-level graphs enhancing pattern recognition and enabling complete valve retrieval (11 valves) with detailed specifications, though hallucinations and reliability remain challenges. Overall, the pipeline enables intuitive, context-rich querying of P&IDs and points to future genAI-enabled benefits for design, HAZOP studies, and operations in process engineering.
Abstract
We propose a methodology that allows communication with Piping and Instrumentation Diagrams (P&IDs) using natural language. In particular, we represent P&IDs through the DEXPI data model as labeled property graphs and integrate them with Large Language Models (LLMs). The approach consists of three main parts: 1) P&IDs are cast into a graph representation from the DEXPI format using our pyDEXPI Python package. 2) A tool for generating P&ID knowledge graphs from pyDEXPI. 3) Integration of the P&ID knowledge graph to LLMs using graph-based retrieval augmented generation (graph-RAG). This approach allows users to communicate with P&IDs using natural language. It extends LLM's ability to retrieve contextual data from P&IDs and mitigate hallucinations. Leveraging the LLM's large corpus, the model is also able to interpret process information in PIDs, which could help engineers in their daily tasks. In the future, this work will also open up opportunities in the context of other generative Artificial Intelligence (genAI) solutions on P&IDs, and AI-assisted HAZOP studies.
