TPE-Net: Track Point Extraction and Association Network for Rail Path Proposal Generation
Jungwon Kang, Mohammadjavad Ghorbanalivakili, Gunho Sohn, David Beach, Veronica Marin
TL;DR
This work introduces TPE-Net, an end-to-end track point extraction network for rail path proposal generation in autonomous trains. By jointly performing rail-area segmentation and regression to locate center points and left-right rail distances, it produces pixel-level triplets that are spatially clustered into track segments and then assembled into a path tree of all feasible ego-paths. The method achieves strong, real-time performance on RailSem19 with TP-rail pixel and path-level metrics around or above 0.92–0.95, while avoiding reliance on camera parameters or 3D data. Although state-of-the-art methods on private data may exceed these results, TPE-Net offers an end-to-end, geometry-free approach with practical applicability for real-time rail-path reasoning and risk assessment.
Abstract
One essential feature of an autonomous train is minimizing collision risks with third-party objects. To estimate the risk, the control system must identify topological information of all the rail routes ahead on which the train can possibly move, especially within merging or diverging rails. This way, the train can figure out the status of potential obstacles with respect to its route and hence, make a timely decision. Numerous studies have successfully extracted all rail tracks as a whole within forward-looking images without considering element instances. Still, some image-based methods have employed hard-coded prior knowledge of railway geometry on 3D data to associate left-right rails and generate rail route instances. However, we propose a rail path extraction pipeline in which left-right rail pixels of each rail route instance are extracted and associated through a fully convolutional encoder-decoder architecture called TPE-Net. Two different regression branches for TPE-Net are proposed to regress the locations of center points of each rail route, along with their corresponding left-right pixels. Extracted rail pixels are then spatially clustered to generate topological information of all the possible train routes (ego-paths), discarding non-ego-path ones. Experimental results on a challenging, publicly released benchmark show true-positive-pixel level average precision and recall of 0.9207 and 0.8721, respectively, at about 12 frames per second. Even though our evaluation results are not higher than the SOTA, the proposed regression pipeline performs remarkably in extracting the correspondences by looking once at the image. It generates strong rail route hypotheses without reliance on camera parameters, 3D data, and geometrical constraints.
