Text-Enhanced Panoptic Symbol Spotting in CAD Drawings
Xianlin Liu, Yan Gong, Bohao Li, Jiajing Huang, Bowen Du, Junchen Ye, Liyan Xu
TL;DR
The paper tackles panoptic symbol spotting in CAD drawings by addressing the underutilization of textual annotations and the lack of explicit modeling of cross-type primitive relations. It introduces a text-enhanced framework that constructs a primitive graph incorporating both geometric and textual elements, initializes features with a CNN, and refines representations via a Transformer backbone augmented with a type-aware attention mechanism and edge features. Key contributions include the Text Primitives Integration Module, the type-aware attention design, and achieving state-of-the-art performance on FloorPlanCAD, demonstrated by improvements in PQ, RQ, and SQ over strong baselines. The approach enhances semantic understanding of CAD drawings, enabling more reliable CAD automation and retrieval in real-world, text-rich floor plans.
Abstract
With the widespread adoption of Computer-Aided Design(CAD) drawings in engineering, architecture, and industrial design, the ability to accurately interpret and analyze these drawings has become increasingly critical. Among various subtasks, panoptic symbol spotting plays a vital role in enabling downstream applications such as CAD automation and design retrieval. Existing methods primarily focus on geometric primitives within the CAD drawings to address this task, but they face following major problems: they usually overlook the rich textual annotations present in CAD drawings and they lack explicit modeling of relationships among primitives, resulting in incomprehensive understanding of the holistic drawings. To fill this gap, we propose a panoptic symbol spotting framework that incorporates textual annotations. The framework constructs unified representations by jointly modeling geometric and textual primitives. Then, using visual features extract by pretrained CNN as the initial representations, a Transformer-based backbone is employed, enhanced with a type-aware attention mechanism to explicitly model the different types of spatial dependencies between various primitives. Extensive experiments on the real-world dataset demonstrate that the proposed method outperforms existing approaches on symbol spotting tasks involving textual annotations, and exhibits superior robustness when applied to complex CAD drawings.
