Table of Contents
Fetching ...

Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

Chengwei Ma, Zhen Tian, Zhou Zhou, Zhixian Xu, Xiaowei Zhu, Xia Hua, Si Shi, F. Richard Yu

TL;DR

Engineering schematics require precise topology and symbolic reasoning that pixel-based MLLMs struggle to provide. The authors propose a Vector-to-Graph (V2G) pipeline that converts CAD drawings into property graphs and couples an MLLM planner with deterministic Graph Signal Processing (GSP) verification to audit compliance. On a diagnostic benchmark of roughly 900 augmented schematics, V2G yields substantial accuracy gains for six MLLMs (up to +60%), while baselines remain near chance. This work demonstrates that explicit structure and graph-based verification are essential for reliable multimodal auditing in engineering, and it provides a public benchmark and implementation to foster further research.

Abstract

Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art models fail to capture topology and symbolic logic in engineering schematics, as their pixel-driven paradigm discards the explicit vector-defined relations needed for reasoning. To overcome this, we propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity, making structural dependencies explicit and machine-auditable. On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level. These results highlight the systemic inadequacy of pixel-based methods and demonstrate that structure-aware representations provide a reliable path toward practical deployment of multimodal AI in engineering domains. To facilitate further research, we release our benchmark and implementation at https://github.com/gm-embodied/V2G-Audit.

Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

TL;DR

Engineering schematics require precise topology and symbolic reasoning that pixel-based MLLMs struggle to provide. The authors propose a Vector-to-Graph (V2G) pipeline that converts CAD drawings into property graphs and couples an MLLM planner with deterministic Graph Signal Processing (GSP) verification to audit compliance. On a diagnostic benchmark of roughly 900 augmented schematics, V2G yields substantial accuracy gains for six MLLMs (up to +60%), while baselines remain near chance. This work demonstrates that explicit structure and graph-based verification are essential for reliable multimodal auditing in engineering, and it provides a public benchmark and implementation to foster further research.

Abstract

Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art models fail to capture topology and symbolic logic in engineering schematics, as their pixel-driven paradigm discards the explicit vector-defined relations needed for reasoning. To overcome this, we propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity, making structural dependencies explicit and machine-auditable. On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level. These results highlight the systemic inadequacy of pixel-based methods and demonstrate that structure-aware representations provide a reliable path toward practical deployment of multimodal AI in engineering domains. To facilitate further research, we release our benchmark and implementation at https://github.com/gm-embodied/V2G-Audit.
Paper Structure (8 sections, 4 equations, 3 figures, 2 tables)

This paper contains 8 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: State-of-the-art MLLM fails to ensure single-point CT grounding, exemplifying structural blindness.
  • Figure 2: Representative compliance tasks used to expose structural blindness. MLLMs fail consistently across grounding, wiring, and labeling checks, revealing inability to reason over schematic topology.
  • Figure 3: Proposed V2G-based auditing framework.