Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
Xingguang Wei, Haomin Wang, Shenglong Ye, Ruifeng Luo, Yanting Zhang, Lixin Gu, Jifeng Dai, Yu Qiao, Wenhai Wang, Hongjie Zhang
TL;DR
The paper tackles panoptic symbol spotting in CAD drawings, a task combining instance and semantic segmentation for vector graphics. It introduces VecFormer, a line-based, type-agnostic representation of primitives processed by a dual-branch Transformer, plus a Branch Fusion Refinement post-processing step to harmonize instance and semantic predictions. Key contributions include (1) a line-based primitive encoding with Line Sampling, Line Pooling, and Layer Feature Enhancement; (2) a six-layer Query Decoder enabling joint instance and semantic predictions; and (3) state-of-the-art PQ (91.1) on FloorPlanCAD with substantial Stuff-PQ gains and robustness without prior information. The approach improves geometric fidelity, efficiency, and robustness for vector graphic understanding in CAD applications, enabling more reliable panoptic outputs in real-world workflows.
Abstract
We study the task of panoptic symbol spotting, which involves identifying both individual instances of countable things and the semantic regions of uncountable stuff in computer-aided design (CAD) drawings composed of vector graphical primitives. Existing methods typically rely on image rasterization, graph construction, or point-based representation, but these approaches often suffer from high computational costs, limited generality, and loss of geometric structural information. In this paper, we propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives. This design preserves the geometric continuity of the original primitive, enabling more accurate shape representation while maintaining a computation-friendly structure, making it well-suited for vector graphic understanding tasks. To further enhance prediction reliability, we introduce a Branch Fusion Refinement module that effectively integrates instance and semantic predictions, resolving their inconsistencies for more coherent panoptic outputs. Extensive experiments demonstrate that our method establishes a new state-of-the-art, achieving 91.1 PQ, with Stuff-PQ improved by 9.6 and 21.2 points over the second-best results under settings with and without prior information, respectively, highlighting the strong potential of line-based representation as a foundation for vector graphic understanding.
