Table of Contents
Fetching ...

YOLinO++: Single-Shot Estimation of Generic Polylines for Mapless Automated Diving

Annika Meyer, Christoph Stiller

TL;DR

YOLinO++ introduces a YOLO-inspired, single-shot network for mapless detection of 1D line features such as lane centerlines, borders, and markings in urban driving. It uses a grid-based discretization and a novel Midpoint-Direction (MR) representation (or Cart) to predict multiple line hypotheses per cell, enabling robust handling of intersections and complex topologies in real time. The method supports both dynamic assignment of GT to predictors and anchor-based preassignment, with a loss that combines geometry, classification, and confidence terms. Evaluations on Argoverse, TuSimple, and KAI datasets demonstrate real-time performance (around a few milliseconds per image) and accurate, direction-aware line detections, highlighting the approach’s potential for mapless perception and localization in dynamic environments.

Abstract

In automated driving, highly accurate maps are commonly used to support and complement perception. These maps are costly to create and quickly become outdated as the traffic world is permanently changing. In order to support or replace the map of an automated system with detections from sensor data, a perception module must be able to detect the map features. We propose a neural network that follows the one shot philosophy of YOLO but is designed for detection of 1D structures in images, such as lane boundaries. We extend previous ideas by a midpoint based line representation and anchor definitions. This representation can be used to describe lane borders, markings, but also implicit features such as centerlines of lanes. The broad applicability of the approach is shown with the detection performance on lane centerlines, lane borders as well as the markings both on highways and in urban areas. Versatile lane boundaries are detected and can be inherently classified as dashed or solid lines, curb, road boundaries, or implicit delimitation.

YOLinO++: Single-Shot Estimation of Generic Polylines for Mapless Automated Diving

TL;DR

YOLinO++ introduces a YOLO-inspired, single-shot network for mapless detection of 1D line features such as lane centerlines, borders, and markings in urban driving. It uses a grid-based discretization and a novel Midpoint-Direction (MR) representation (or Cart) to predict multiple line hypotheses per cell, enabling robust handling of intersections and complex topologies in real time. The method supports both dynamic assignment of GT to predictors and anchor-based preassignment, with a loss that combines geometry, classification, and confidence terms. Evaluations on Argoverse, TuSimple, and KAI datasets demonstrate real-time performance (around a few milliseconds per image) and accurate, direction-aware line detections, highlighting the approach’s potential for mapless perception and localization in dynamic environments.

Abstract

In automated driving, highly accurate maps are commonly used to support and complement perception. These maps are costly to create and quickly become outdated as the traffic world is permanently changing. In order to support or replace the map of an automated system with detections from sensor data, a perception module must be able to detect the map features. We propose a neural network that follows the one shot philosophy of YOLO but is designed for detection of 1D structures in images, such as lane boundaries. We extend previous ideas by a midpoint based line representation and anchor definitions. This representation can be used to describe lane borders, markings, but also implicit features such as centerlines of lanes. The broad applicability of the approach is shown with the detection performance on lane centerlines, lane borders as well as the markings both on highways and in urban areas. Versatile lane boundaries are detected and can be inherently classified as dashed or solid lines, curb, road boundaries, or implicit delimitation.
Paper Structure (21 sections, 1 equation, 9 figures, 1 table)

This paper contains 21 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: Estimating polylines discretized by a fixed grid enables various topologies especially for urban areas and fast estimation. We show an example image with its map projected into an image from the Argoverse dataset wilson_ArgoverseNextGeneration_2021.
  • Figure 2: Three decoder variants with different resolution per cell. Folded layers are visualized in blue, transposed folded layers in orange, and fusion layers in gray. The fusion layers additionally get information from the skip connections passed on by the feature maps of the eighth or 13th folding layer from the encoder.
  • Figure 3: Cart and MR to represent a line segment within a cell.
  • Figure 4: Anchors for the MR representation. In the top row we show examples of the equally distributed anchors, whereas the bottom row shows examples retrieved as cluster representatives from the Argoverse 2.0 dataset. The colors distinguish different anchors. Here, not all anchors are fully recognizable as they e.g. share the same center point and might travel in the opposite direction occupying the same pixel space. a) visualizes eight anchors defined and distributed by their center point. b) shows eight anchors distributed along the direction. c) shows the case for 24 anchors distributed in both center point and direction coordinates.
  • Figure 5: Comparing the $uv$-metrics for a dynamic assignment (blue) and anchors (orange). We evaluate the same models as in \ref{['tab:ex_all']}, marked with $\sim$ and $\star$, respectively, but applied a circular evaluation gate in $uv$ coordinates for determining TP. The results in \ref{['tab:ex_all']} are presented with a rectangular matching gate limited to the cell. The horizontal lines mark the cell based estimation for anchors.
  • ...and 4 more figures