Table of Contents
Fetching ...

RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

Xiaoyu Xian, Shiao Wang, Xiao Wang, Daxin Tian, Yan Tian

TL;DR

This work proposes a robust baseline method based on a pre-trained RGB OCR foundation model, enhanced through multi-modal adaptation for Kilometer Marker Recognition (KMR), a critical task for autonomous metro localization under GNSS-denied conditions.

Abstract

Metro trains often operate in highly complex environments, characterized by illumination variations, high-speed motion, and adverse weather conditions. These factors pose significant challenges for visual perception systems, especially those relying solely on conventional RGB cameras. To tackle these difficulties, we explore the integration of event cameras into the perception system, leveraging their advantages in low-light conditions, high-speed scenarios, and low power consumption. Specifically, we focus on Kilometer Marker Recognition (KMR), a critical task for autonomous metro localization under GNSS-denied conditions. In this context, we propose a robust baseline method based on a pre-trained RGB OCR foundation model, enhanced through multi-modal adaptation. Furthermore, we construct the first large-scale RGB-Event dataset, EvMetro5K, containing 5,599 pairs of synchronized RGB-Event samples, split into 4,479 training and 1,120 testing samples. Extensive experiments on EvMetro5K and other widely used benchmarks demonstrate the effectiveness of our approach for KMR. Both the dataset and source code will be released on https://github.com/Event-AHU/EvMetro5K_benchmark

RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

TL;DR

This work proposes a robust baseline method based on a pre-trained RGB OCR foundation model, enhanced through multi-modal adaptation for Kilometer Marker Recognition (KMR), a critical task for autonomous metro localization under GNSS-denied conditions.

Abstract

Metro trains often operate in highly complex environments, characterized by illumination variations, high-speed motion, and adverse weather conditions. These factors pose significant challenges for visual perception systems, especially those relying solely on conventional RGB cameras. To tackle these difficulties, we explore the integration of event cameras into the perception system, leveraging their advantages in low-light conditions, high-speed scenarios, and low power consumption. Specifically, we focus on Kilometer Marker Recognition (KMR), a critical task for autonomous metro localization under GNSS-denied conditions. In this context, we propose a robust baseline method based on a pre-trained RGB OCR foundation model, enhanced through multi-modal adaptation. Furthermore, we construct the first large-scale RGB-Event dataset, EvMetro5K, containing 5,599 pairs of synchronized RGB-Event samples, split into 4,479 training and 1,120 testing samples. Extensive experiments on EvMetro5K and other widely used benchmarks demonstrate the effectiveness of our approach for KMR. Both the dataset and source code will be released on https://github.com/Event-AHU/EvMetro5K_benchmark
Paper Structure (22 sections, 8 equations, 5 figures, 5 tables)

This paper contains 22 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The multi-modal imaging device proposed in this paper and typical metro perception scenarios. Specifically, sub-figures (c) and (d) show the imaging results of the metro under low-illumination conditions in high-speed motion/static scenes and overexposed scenes, respectively. In each sub-figure (c, d), the four images arranged clockwise are: the RGB image, the NIR image, the stacked event stream rendered as a red-blue map, and the reconstructed grayscale image from the event stream.
  • Figure 2: An overview of our proposed RGB-Event based Hypergraph Prompt for Kilometer Marker Recognition based on foundation models.
  • Figure 3: Example samples from the EvMetro5K dataset. Each pair shows the RGB image (left) and the corresponding event-reconstructed grayscale image (right). While the RGB modality often suffers from low light, motion blur, and overexposure conditions, the event modality provides clearer structural information for milestone recognition.
  • Figure 4: Ablation studies of the Fusion layers and Permutations.
  • Figure 5: Visualization of attention maps of our method on the EvMetro5K dataset.