Table of Contents
Fetching ...

Ultra-Fast Adaptive Track Detection Network

Hai Ni, Rui Wang, Scarlett Liu

TL;DR

An ultra-fast adaptive track detection network that comprises a backbone network and two specialized branches that selects the suitable anchor group from preset anchor groups, thereby determining the row coordinates of the railway track.

Abstract

Railway detection is critical for the automation of railway systems. Existing models often prioritize either speed or accuracy, but achieving both remains a challenge. To address the limitations of presetting anchor groups that struggle with varying track proportions from different camera angles, an ultra-fast adaptive track detection network is proposed in this paper. This network comprises a backbone network and two specialized branches (Horizontal Coordinate Locator and Perspective Identifier). The Perspective Identifier selects the suitable anchor group from preset anchor groups, thereby determining the row coordinates of the railway track. Subsequently, the Horizontal Coordinate Locator provides row classification results based on multiple preset anchor groups. Then, utilizing the results from the Perspective Identifier, it generates the column coordinates of the railway track. This network is evaluated on multiple datasets, with the lightweight version achieving an F1 score of 98.68% on the SRail dataset and a detection rate of up to 473 FPS. Compared to the SOTA, the proposed model is competitive in both speed and accuracy. The dataset and code are available at https://github.com/idnihai/UFATD

Ultra-Fast Adaptive Track Detection Network

TL;DR

An ultra-fast adaptive track detection network that comprises a backbone network and two specialized branches that selects the suitable anchor group from preset anchor groups, thereby determining the row coordinates of the railway track.

Abstract

Railway detection is critical for the automation of railway systems. Existing models often prioritize either speed or accuracy, but achieving both remains a challenge. To address the limitations of presetting anchor groups that struggle with varying track proportions from different camera angles, an ultra-fast adaptive track detection network is proposed in this paper. This network comprises a backbone network and two specialized branches (Horizontal Coordinate Locator and Perspective Identifier). The Perspective Identifier selects the suitable anchor group from preset anchor groups, thereby determining the row coordinates of the railway track. Subsequently, the Horizontal Coordinate Locator provides row classification results based on multiple preset anchor groups. Then, utilizing the results from the Perspective Identifier, it generates the column coordinates of the railway track. This network is evaluated on multiple datasets, with the lightweight version achieving an F1 score of 98.68% on the SRail dataset and a detection rate of up to 473 FPS. Compared to the SOTA, the proposed model is competitive in both speed and accuracy. The dataset and code are available at https://github.com/idnihai/UFATD
Paper Structure (19 sections, 7 equations, 5 figures, 5 tables)

This paper contains 19 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Railway track images captured from various camera perspectives. As the camera's perspective changes, there is a corresponding alteration in the proportion of sky and ground captured in the images.
  • Figure 2: Illustration of gridding the image by selecting a suitable group of anchors for the image. Each image will be matched with an appropriate group of anchors. The image is divided into $(w+1)$ grids in the horizontal direction based on these anchors. To better locate possible bends and other features in the upper part of the image, the distance between anchors decreases as they approach the top of the image. $d'_{j,k}$ is the anchor row distance calculated by the method proposed in section \ref{['sec:anchor_gen']}.
  • Figure 3: The structure of UFATD. The image is first input into the backbone network to extract high-level features, and then it is fed into two separate branch networks. The Perspective Identifier branch classifies the image category based on the camera perspective. Each category corresponds to a group of anchors. This branch is an $n$-class classification network. The Horizontal Coordinate Locator branch outputs $((w+1), h, C, n)$-dimensional data, which includes the column coordinates under $n$ groups of anchors. The classification result from the Perspective Identifier branch is passed through a softmax function and then input to the Horizontal Coordinate Locator branch, where the corresponding group of anchors for the image is selected.
  • Figure 4: Quantitative comparison of various methods on the DL-RAIL dataset. $^*$ indicates that the test set is employed as the validation set for the optimal model.
  • Figure 5: Visualization on the SRail, the Rail-DB and the DL-RAIL dataset.