Table of Contents
Fetching ...

Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking

Xin Tong, Shi Peng, Baojie Tian, Yufei Guo, Xuhui Huang, Zhe Ma

TL;DR

This work addresses the miscalibration and slow convergence of Transformer-based line segment detection by introducing RANK-LETR, which uses centroid-based matched predicting to eliminate online bipartite matching and a geometry-informed line re-ranking to better reflect line quality. It also adds a line segment ranking loss to train feature points to favor higher-quality predictions, and deploys high-resolution predictions with rotation augmentation in a Deformable Transformer backbone. The method demonstrates superior accuracy over both Transformer-based and CNN-based baselines on the Wireframe and YorkUrban datasets, with faster convergence (e.g., after ~60 epochs). These contributions offer improved detection precision and training efficiency, making Transformer-based LSD more practical for real-world applications.

Abstract

Classical Transformer-based line segment detection methods have delivered impressive results. However, we observe that some accurately detected line segments are assigned low confidence scores during prediction, causing them to be ranked lower and potentially suppressed. Additionally, these models often require prolonged training periods to achieve strong performance, largely due to the necessity of bipartite matching. In this paper, we introduce RANK-LETR, a novel Transformer-based line segment detection method. Our approach leverages learnable geometric information to refine the ranking of predicted line segments by enhancing the confidence scores of high-quality predictions in a posterior verification step. We also propose a new line segment proposal method, wherein the feature point nearest to the centroid of the line segment directly predicts the location, significantly improving training efficiency and stability. Moreover, we introduce a line segment ranking loss to stabilize rankings during training, thereby enhancing the generalization capability of the model. Experimental results demonstrate that our method outperforms other Transformer-based and CNN-based approaches in prediction accuracy while requiring fewer training epochs than previous Transformer-based models.

Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking

TL;DR

This work addresses the miscalibration and slow convergence of Transformer-based line segment detection by introducing RANK-LETR, which uses centroid-based matched predicting to eliminate online bipartite matching and a geometry-informed line re-ranking to better reflect line quality. It also adds a line segment ranking loss to train feature points to favor higher-quality predictions, and deploys high-resolution predictions with rotation augmentation in a Deformable Transformer backbone. The method demonstrates superior accuracy over both Transformer-based and CNN-based baselines on the Wireframe and YorkUrban datasets, with faster convergence (e.g., after ~60 epochs). These contributions offer improved detection precision and training efficiency, making Transformer-based LSD more practical for real-world applications.

Abstract

Classical Transformer-based line segment detection methods have delivered impressive results. However, we observe that some accurately detected line segments are assigned low confidence scores during prediction, causing them to be ranked lower and potentially suppressed. Additionally, these models often require prolonged training periods to achieve strong performance, largely due to the necessity of bipartite matching. In this paper, we introduce RANK-LETR, a novel Transformer-based line segment detection method. Our approach leverages learnable geometric information to refine the ranking of predicted line segments by enhancing the confidence scores of high-quality predictions in a posterior verification step. We also propose a new line segment proposal method, wherein the feature point nearest to the centroid of the line segment directly predicts the location, significantly improving training efficiency and stability. Moreover, we introduce a line segment ranking loss to stabilize rankings during training, thereby enhancing the generalization capability of the model. Experimental results demonstrate that our method outperforms other Transformer-based and CNN-based approaches in prediction accuracy while requiring fewer training epochs than previous Transformer-based models.

Paper Structure

This paper contains 14 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison between unmatched and matched predicting in Transformer-based line segment detection. (a) In unmatched predicting, predictions need to be matched with ground truth online using bipartite matching. (b) In matched predicting, each prediction is directly associated with a specific ground truth value, resulting in higher training efficiency and stability.
  • Figure 2: Overview of the proposed RANK-LETR. The process begins by feeding an image into a CNN backbone to extract multi-level feature maps from different layers. These features are then processed by a deformable Transformer encoder to generate candidate line segments. The candidate segments are predicted using high-resolution feature maps for higher prediction accuracy and less ambiguity, with each segment represented by confidence scores and positions. Each feature point is responsible for detecting the line segment whose centroid is nearest to it. Additionally, learnable geometric information is extracted from the multi-level features using a CNN-based geometric information extractor. Finally, the line segments are re-ranked by optimizing their confidence scores with the learnable geometric information.
  • Figure 3: Visual examples of line segment detection results of two Transformer-based methods including LETR and ours on the Wireframe dataset. Our method can produce more accurate and complete detection results. For a better visual experience, we emphasize some examples of accurate detection with red bounding boxes and complete detection with purple bounding boxes.
  • Figure 4: The accuracy curves for line segment detection over the time of training. We observed that our method reaches a high level of accuracy after just 60 epochs.
  • Figure 5: Saliency maps generated from score maps (left), edge maps (middle) and endpoint maps (right).