Table of Contents
Fetching ...

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

Xingguang Wei, Haomin Wang, Shenglong Ye, Ruifeng Luo, Yanting Zhang, Lixin Gu, Jifeng Dai, Yu Qiao, Wenhai Wang, Hongjie Zhang

TL;DR

The paper tackles panoptic symbol spotting in CAD drawings, a task combining instance and semantic segmentation for vector graphics. It introduces VecFormer, a line-based, type-agnostic representation of primitives processed by a dual-branch Transformer, plus a Branch Fusion Refinement post-processing step to harmonize instance and semantic predictions. Key contributions include (1) a line-based primitive encoding with Line Sampling, Line Pooling, and Layer Feature Enhancement; (2) a six-layer Query Decoder enabling joint instance and semantic predictions; and (3) state-of-the-art PQ (91.1) on FloorPlanCAD with substantial Stuff-PQ gains and robustness without prior information. The approach improves geometric fidelity, efficiency, and robustness for vector graphic understanding in CAD applications, enabling more reliable panoptic outputs in real-world workflows.

Abstract

We study the task of panoptic symbol spotting, which involves identifying both individual instances of countable things and the semantic regions of uncountable stuff in computer-aided design (CAD) drawings composed of vector graphical primitives. Existing methods typically rely on image rasterization, graph construction, or point-based representation, but these approaches often suffer from high computational costs, limited generality, and loss of geometric structural information. In this paper, we propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives. This design preserves the geometric continuity of the original primitive, enabling more accurate shape representation while maintaining a computation-friendly structure, making it well-suited for vector graphic understanding tasks. To further enhance prediction reliability, we introduce a Branch Fusion Refinement module that effectively integrates instance and semantic predictions, resolving their inconsistencies for more coherent panoptic outputs. Extensive experiments demonstrate that our method establishes a new state-of-the-art, achieving 91.1 PQ, with Stuff-PQ improved by 9.6 and 21.2 points over the second-best results under settings with and without prior information, respectively, highlighting the strong potential of line-based representation as a foundation for vector graphic understanding.

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

TL;DR

The paper tackles panoptic symbol spotting in CAD drawings, a task combining instance and semantic segmentation for vector graphics. It introduces VecFormer, a line-based, type-agnostic representation of primitives processed by a dual-branch Transformer, plus a Branch Fusion Refinement post-processing step to harmonize instance and semantic predictions. Key contributions include (1) a line-based primitive encoding with Line Sampling, Line Pooling, and Layer Feature Enhancement; (2) a six-layer Query Decoder enabling joint instance and semantic predictions; and (3) state-of-the-art PQ (91.1) on FloorPlanCAD with substantial Stuff-PQ gains and robustness without prior information. The approach improves geometric fidelity, efficiency, and robustness for vector graphic understanding in CAD applications, enabling more reliable panoptic outputs in real-world workflows.

Abstract

We study the task of panoptic symbol spotting, which involves identifying both individual instances of countable things and the semantic regions of uncountable stuff in computer-aided design (CAD) drawings composed of vector graphical primitives. Existing methods typically rely on image rasterization, graph construction, or point-based representation, but these approaches often suffer from high computational costs, limited generality, and loss of geometric structural information. In this paper, we propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives. This design preserves the geometric continuity of the original primitive, enabling more accurate shape representation while maintaining a computation-friendly structure, making it well-suited for vector graphic understanding tasks. To further enhance prediction reliability, we introduce a Branch Fusion Refinement module that effectively integrates instance and semantic predictions, resolving their inconsistencies for more coherent panoptic outputs. Extensive experiments demonstrate that our method establishes a new state-of-the-art, achieving 91.1 PQ, with Stuff-PQ improved by 9.6 and 21.2 points over the second-best results under settings with and without prior information, respectively, highlighting the strong potential of line-based representation as a foundation for vector graphic understanding.

Paper Structure

This paper contains 23 sections, 13 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Visualization of primitive representations. Compared to the blurry visual representations of point-based methods (b, c), our line-based approach (d) more closely reflects the ground truth drawing (a). As the data is in vector format, please feel free to zoom in to observe finer differences. Additional comparisons are provided in \ref{['sec:app_detail_repre']} and \ref{['sec:sampling_ratio_compare']}.
  • Figure 2: Overview of VecFormer. Given a CAD drawing, VecFormer first applies line sampling to build a line-based representation of primitives. A Transformer backbone is then used to extract line-level features, which are subsequently aggregated into primitive-level features. Next, these primitive-level features are enhanced by a Layer Feature Enhancement module and fed into a Transformer decoder for joint instance and semantic prediction. Finally, a Branch Fusion Refinement module integrates both branches to produce the final panoptic symbol spotting result.
  • Figure 3: Qualitative comparison of primitive-level semantic quality between VecFormer and SymPoint-V2. Each row shows a representative example, with (a) Ground Truth annotations, (b) predictions from our VecFormer, and (c) predictions from SymPoint-V2. As shown, VecFormer provides more accurate and consistent semantic predictions across various graphical primitives.
  • Figure 4: Visualization of how different representations perform on different primitives.
  • Figure 5: Visual comparison of the effects of varying sampling ratios on different representations. Since the data is in vector format, zooming in allows for a detailed examination of the differences between representations.
  • ...and 5 more figures