Table of Contents
Fetching ...

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

Taiting Lu, Kaiyuan Lin, Yuxin Tian, Yubo Wang, Muchuan Wang, Sharique Khatri, Akshit Kartik, Yixi Wang, Amey Santosh Rane, Yida Wang, Yifan Yang, Yi-Chao Chen, Yincheng Jin, Mahanth Gowda

Abstract

Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) schematic diagrams into machine-readable spatially weighted netlist graphs, jointly capturing component attributes, connectivity, and geometry, remains largely underexplored, despite such graph representations are the backbone of practical electronic design automation (EDA) workflows. To bridge this gap, we introduce OmniSch, the first comprehensive benchmark designed to assess LMMs on schematic understanding and spatial netlist graph construction. OmniSch contains 1,854 real-world schematic diagrams and includes four tasks: (1) visual grounding for schematic entities, with 109.9K grounded instances aligning 423.4K diagram semantic labels to their visual regions; (2) diagram-to-graph reasoning, understanding topological relationship among diagram elements; (3) geometric reasoning, constructing layout-dependent weights for each connection; and (4) tool-augmented agentic reasoning for visual search, invoking external tools to accomplish (1)-(3). Our results reveal substantial gaps of current LMMs in interpreting schematic engineering artifacts, including unreliable fine-grained grounding, brittle layout-to-graph parsing, inconsistent global connectivity reasoning and inefficient visual exploration.

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

Abstract

Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) schematic diagrams into machine-readable spatially weighted netlist graphs, jointly capturing component attributes, connectivity, and geometry, remains largely underexplored, despite such graph representations are the backbone of practical electronic design automation (EDA) workflows. To bridge this gap, we introduce OmniSch, the first comprehensive benchmark designed to assess LMMs on schematic understanding and spatial netlist graph construction. OmniSch contains 1,854 real-world schematic diagrams and includes four tasks: (1) visual grounding for schematic entities, with 109.9K grounded instances aligning 423.4K diagram semantic labels to their visual regions; (2) diagram-to-graph reasoning, understanding topological relationship among diagram elements; (3) geometric reasoning, constructing layout-dependent weights for each connection; and (4) tool-augmented agentic reasoning for visual search, invoking external tools to accomplish (1)-(3). Our results reveal substantial gaps of current LMMs in interpreting schematic engineering artifacts, including unreliable fine-grained grounding, brittle layout-to-graph parsing, inconsistent global connectivity reasoning and inefficient visual exploration.

Paper Structure

This paper contains 18 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Large multimodal models fail to reliably perform core visual understanding tasks on structured schematic diagrams. Errors in component detection, topological connectivity reasoning, and spatial layout interpretation highlight persistent limitations in handling the compositional and relational complexity inherent in schematic diagrams.
  • Figure 2: Overview of OmniSch benchmark with representative cases.
  • Figure 3: Comparison between different data annotation paradigms. (a) Manual annotation:. Relying on human expert on labeling . (b) Auto annotation: Our OmniSch directly directly render schematic source code and automatically label data via our EDA rendering engine.
  • Figure 4: Statistical overview of the OmniSch benchmark. The dataset encompasses a diverse range of electronic domains, comprising 1-440 symbols, 1-1200 pins, 1-400 nets, and 1-1600 text instances. This large-scale diversity provides a comprehensive benchmark for the automatic generation and evaluation of schematic netlists.
  • Figure 5: Analysis of net name tracing behavior across different LMM agents.