From 2D to 3D: AISG-SLA Visual Localization Challenge

Jialin Gao; Bill Ong; Darld Lwi; Zhen Hao Ng; Xun Wei Yee; Mun-Thye Mak; Wee Siong Ng; See-Kiong Ng; Hui Ying Teo; Victor Khoo; Georg Bökman; Johan Edstedt; Kirill Brodt; Clémentin Boittiaux; Maxime Ferrera; Stepan Konev

From 2D to 3D: AISG-SLA Visual Localization Challenge

Jialin Gao, Bill Ong, Darld Lwi, Zhen Hao Ng, Xun Wei Yee, Mun-Thye Mak, Wee Siong Ng, See-Kiong Ng, Hui Ying Teo, Victor Khoo, Georg Bökman, Johan Edstedt, Kirill Brodt, Clémentin Boittiaux, Maxime Ferrera, Stepan Konev

TL;DR

The AISG–SLA Visual Localization Challenge (VLC) at IJCAI 2023 was organized to explore how AI can accurately extract camera pose data from 2D images in 3D space.

Abstract

Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring 3D data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining the camera's pose solely through visual cues. However, this task is challenging due to limited data from a single camera. To tackle these challenges, we organized the AISG-SLA Visual Localization Challenge (VLC) at IJCAI 2023 to explore how AI can accurately extract camera pose data from 2D images in 3D space. The challenge attracted over 300 participants worldwide, forming 50+ teams. Winning teams achieved high accuracy in pose estimation using images from a car-mounted camera with low frame rates. The VLC dataset is available for research purposes upon request via vlc-dataset@aisingapore.org.

From 2D to 3D: AISG-SLA Visual Localization Challenge

TL;DR

The AISG–SLA Visual Localization Challenge (VLC) at IJCAI 2023 was organized to explore how AI can accurately extract camera pose data from 2D images in 3D space.

Abstract

Paper Structure (9 sections, 4 figures)

This paper contains 9 sections, 4 figures.

Introduction
Related Work
VLC Dataset
Data Sources
Data Matching Challenge
Proposed Methods
RoMa and DeDoDe Strategy
CNN-based Strategy
Conclusion

Figures (4)

Figure 1: The camera is directed towards the rear of the vehicle, resulting in consecutive captures where the movement is not depicted within the image frame.
Figure 2: Illustration of challenging matches by LightGlu.
Figure 3: Method pipeline. 3(a) Extract DeDoDe keypoints in all images and fine non-sequential image pairs to match using i) DINO v2, ii) manual inspection. 3(b) Match keypoints in sequential and non-sequential image pairs using RoMa, filter with Graph-Cut RANSAC. 3(c) Structure from motion using COLMAP, each red dot is a position where a picture was taken.
Figure 4: Visual representation of predicted sub-trajectories of the rotational error (RE).

From 2D to 3D: AISG-SLA Visual Localization Challenge

TL;DR

Abstract

From 2D to 3D: AISG-SLA Visual Localization Challenge

Authors

TL;DR

Abstract

Table of Contents

Figures (4)