Table of Contents
Fetching ...

Automatic Image Annotation for Mapped Features Detection

Maxime Noizet, Philippe Xu, Philippe Bonnifait

TL;DR

The work tackles map-constrained pole detection for autonomous localization by automatically annotating unlabeled imagery with a multi-modal fusion of map-based, segmentation-based, and lidar-based cues. It introduces a two-step fusion process (data association and consensus-based fusion) and a mechanism to manage ambiguous labels via masking patches. Experimental results show that combining annotations improves detector training quality and that masking ambiguous regions enhances recall with limited precision loss, yielding better map-aligned pole detection. This approach enables scalable, map-specific perception without exhaustive manual labeling, with practical relevance to robust vehicle localization.

Abstract

Detecting road features is a key enabler for autonomous driving and localization. For instance, a reliable detection of poles which are widespread in road environments can improve localization. Modern deep learning-based perception systems need a significant amount of annotated data. Automatic annotation avoids time-consuming and costly manual annotation. Because automatic methods are prone to errors, managing annotation uncertainty is crucial to ensure a proper learning process. Fusing multiple annotation sources on the same dataset can be an efficient way to reduce the errors. This not only improves the quality of annotations, but also improves the learning of perception models. In this paper, we consider the fusion of three automatic annotation methods in images: feature projection from a high accuracy vector map combined with a lidar, image segmentation and lidar segmentation. Our experimental results demonstrate the significant benefits of multi-modal automatic annotation for pole detection through a comparative evaluation on manually annotated images. Finally, the resulting multi-modal fusion is used to fine-tune an object detection model for pole base detection using unlabeled data, showing overall improvements achieved by enhancing network specialization. The dataset is publicly available.

Automatic Image Annotation for Mapped Features Detection

TL;DR

The work tackles map-constrained pole detection for autonomous localization by automatically annotating unlabeled imagery with a multi-modal fusion of map-based, segmentation-based, and lidar-based cues. It introduces a two-step fusion process (data association and consensus-based fusion) and a mechanism to manage ambiguous labels via masking patches. Experimental results show that combining annotations improves detector training quality and that masking ambiguous regions enhances recall with limited precision loss, yielding better map-aligned pole detection. This approach enables scalable, map-specific perception without exhaustive manual labeling, with practical relevance to robust vehicle localization.

Abstract

Detecting road features is a key enabler for autonomous driving and localization. For instance, a reliable detection of poles which are widespread in road environments can improve localization. Modern deep learning-based perception systems need a significant amount of annotated data. Automatic annotation avoids time-consuming and costly manual annotation. Because automatic methods are prone to errors, managing annotation uncertainty is crucial to ensure a proper learning process. Fusing multiple annotation sources on the same dataset can be an efficient way to reduce the errors. This not only improves the quality of annotations, but also improves the learning of perception models. In this paper, we consider the fusion of three automatic annotation methods in images: feature projection from a high accuracy vector map combined with a lidar, image segmentation and lidar segmentation. Our experimental results demonstrate the significant benefits of multi-modal automatic annotation for pole detection through a comparative evaluation on manually annotated images. Finally, the resulting multi-modal fusion is used to fine-tune an object detection model for pole base detection using unlabeled data, showing overall improvements achieved by enhancing network specialization. The dataset is publicly available.

Paper Structure

This paper contains 19 sections, 6 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of the connections between georeferenced poles on the map (left) and an image captured by a vehicle (right). The pole bases in red are not visible from the camera. Note that the map does not include the four black bollards on the sides of the crosswalk.
  • Figure 2: Steps in an automatic multi-modal labeling method. Multiple annotation sets are obtained from the images and diverse data sources, including the vector map. Thanks to a data association function $h$, annotations are grouped to derive final annotations through a fusion function $f$. The final annotations are displayed with green stars on the last image.
  • Figure 3: Examples of automatic annotations obtained using three different methods. They are depicted with blue crosses. Green circles represent reference annotations defined by humans and correctly annotated automatically. The red ones are those that are missed.
  • Figure 4: Management of ambiguous pole bases. Green crosses: annotations with unanimous agreement. Orange circles: unnecessary black patches. Red circles: black patches to mask ambiguous pole bases. Blue square: missed pole base.
  • Figure 5: Precision-recall curves after 300 epochs of training using different annotation approaches. The background color indicates the predominant curve.
  • ...and 1 more figures