Monocular Localization with Semantics Map for Autonomous Vehicles
Jixiang Wan, Xudong Zhang, Shuzhou Dong, Yuwei Zhang, Yuchen Yang, Ruoxi Wu, Ye Jiang, Jijunnan Li, Jinquan Lin, Ming Yang
TL;DR
This work tackles robust monocular localization for autonomous driving by leveraging stable semantic cues instead of fragile texture features. It proposes a lightweight two-stage pipeline: offline construction of a global semantic map from LiDAR data and online monocular localization via semantic feature data association, aided by an enhanced IPM that compensates for vehicle-induced orientation changes. The optimization fuses lane markings and pole-like objects with a prior pose using nonlinear least squares, achieving competitive accuracy while dramatically reducing map size compared to dense SLAM baselines. Evaluations on KAIST Urban data and a self-recorded industrial-park dataset demonstrate strong translation and rotation performance and practical real-time operation, highlighting the method’s potential for scalable, low-cost autonomous driving localization. The work advances semantic-map–based localization by combining lightweight segmentation, BEV mapping, and robust feature matching with a global optimization framework.
Abstract
Accurate and robust localization remains a significant challenge for autonomous vehicles. The cost of sensors and limitations in local computational efficiency make it difficult to scale to large commercial applications. Traditional vision-based approaches focus on texture features that are susceptible to changes in lighting, season, perspective, and appearance. Additionally, the large storage size of maps with descriptors and complex optimization processes hinder system performance. To balance efficiency and accuracy, we propose a novel lightweight visual semantic localization algorithm that employs stable semantic features instead of low-level texture features. First, semantic maps are constructed offline by detecting semantic objects, such as ground markers, lane lines, and poles, using cameras or LiDAR sensors. Then, online visual localization is performed through data association of semantic features and map objects. We evaluated our proposed localization framework in the publicly available KAIST Urban dataset and in scenarios recorded by ourselves. The experimental results demonstrate that our method is a reliable and practical localization solution in various autonomous driving localization tasks.
