Table of Contents
Fetching ...

DT-LSD: Deformable Transformer-based Line Segment Detection

Sebastian Janampa, Marios Pattichis

TL;DR

This work proposes a novel Deformable Transformer-based Line Segment Detector (DT-LSD) that addresses LETR's drawbacks and introduces Line Contrastive DeNoising (LCDN), a technique that stabilizes the one-to-one matching process and speeds up training by 34x.

Abstract

Line segment detection is a fundamental low-level task in computer vision, and improvements in this task can impact more advanced methods that depend on it. Most new methods developed for line segment detection are based on Convolutional Neural Networks (CNNs). Our paper seeks to address challenges that prevent the wider adoption of transformer-based methods for line segment detection. More specifically, we introduce a new model called Deformable Transformer-based Line Segment Detection (DT-LSD) that supports cross-scale interactions and can be trained quickly. This work proposes a novel Deformable Transformer-based Line Segment Detector (DT-LSD) that addresses LETR's drawbacks. For faster training, we introduce Line Contrastive DeNoising (LCDN), a technique that stabilizes the one-to-one matching process and speeds up training by 34$\times$. We show that DT-LSD is faster and more accurate than its predecessor transformer-based model (LETR) and outperforms all CNN-based models in terms of accuracy. In the Wireframe dataset, DT-LSD achieves 71.7 for $sAP^{10}$ and 73.9 for $sAP^{15}$; while 33.2 for $sAP^{10}$ and 35.1 for $sAP^{15}$ in the YorkUrban dataset.

DT-LSD: Deformable Transformer-based Line Segment Detection

TL;DR

This work proposes a novel Deformable Transformer-based Line Segment Detector (DT-LSD) that addresses LETR's drawbacks and introduces Line Contrastive DeNoising (LCDN), a technique that stabilizes the one-to-one matching process and speeds up training by 34x.

Abstract

Line segment detection is a fundamental low-level task in computer vision, and improvements in this task can impact more advanced methods that depend on it. Most new methods developed for line segment detection are based on Convolutional Neural Networks (CNNs). Our paper seeks to address challenges that prevent the wider adoption of transformer-based methods for line segment detection. More specifically, we introduce a new model called Deformable Transformer-based Line Segment Detection (DT-LSD) that supports cross-scale interactions and can be trained quickly. This work proposes a novel Deformable Transformer-based Line Segment Detector (DT-LSD) that addresses LETR's drawbacks. For faster training, we introduce Line Contrastive DeNoising (LCDN), a technique that stabilizes the one-to-one matching process and speeds up training by 34. We show that DT-LSD is faster and more accurate than its predecessor transformer-based model (LETR) and outperforms all CNN-based models in terms of accuracy. In the Wireframe dataset, DT-LSD achieves 71.7 for and 73.9 for ; while 33.2 for and 35.1 for in the YorkUrban dataset.

Paper Structure

This paper contains 25 sections, 7 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Feature map enhancing. All line segment detectors use a hierarchical backbone, but they differ from each other in their enhancing method. (a) CNN-based models use a feature pyramid network to combine two contiguous feature maps, allowing the propagation of global information to low-dimensional feature maps. However, no intra-scale interaction is applied to any feature map. (b) LETR LETR uses a global attention encoder for each processed feature map, promoting the intra-scale interaction but not the cross-scale interaction since no information is passed between the two processed feature maps. (c) DT-LSD allows intra- and cross-scale (more than two feature maps) interactions by applying a deformable-attention encoder.
  • Figure 2: Framework of the proposed DT-LSD model. DT-LSD uses a deformable encoder and deformable decoder layers. Furthermore, it uses a set of mixed queries as a training strategy which does not influence the inference time.
  • Figure 3: Feature maps pre-processing for the encoder.
  • Figure 4: Comparison between contrastive denoising techniques applied to line segments. We present two different line segments and their positive and negative queries. We use solid and dashed to different between line segment samples.
  • Figure 5: Precision-Recall (PR) curves. PR curves comparisons between L-CNNlcnn, LETRLETR and DT-LSD(ours) using sAP$^{10}$ and AP$^\text{H}$ metrics for Wireframe and YorkUrban datasets.
  • ...and 1 more figures