Table of Contents
Fetching ...

A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

Haoxuan Ding, Qi Wang, Junyu Gao, Qiang Li

TL;DR

OneShotLP presents a training-free, video-based license plate analysis framework that starts from a single annotated LP center in the first frame. It combines a point-tracking module (CoTracker), a promptable segmentation module (EfficientSAM), and a multimodal large language model (MLLM) for recognition to achieve LP tracking and recognition without task-specific training. The approach demonstrates strong LP detection performance and competitive recognition on UFPR-ALPR and SSIG-SegPlate, highlighting the potential of leveraging foundation models for adaptable, region-agnostic LP analysis in intelligent transportation systems. The results underscore the practicality of zero-shot LP analysis with minimal supervision and indicate avenues for improving LP-specific recognition through prompt engineering and model fine-tuning.

Abstract

Traditional license plate detection and recognition models are often trained on closed datasets, limiting their ability to handle the diverse license plate formats across different regions. The emergence of large-scale pre-trained models has shown exceptional generalization capabilities, enabling few-shot and zero-shot learning. We propose OneShotLP, a training-free framework for video-based license plate detection and recognition, leveraging these advanced models. Starting with the license plate position in the first video frame, our method tracks this position across subsequent frames using a point tracking module, creating a trajectory of prompts. These prompts are input into a segmentation module that uses a promptable large segmentation model to generate local masks of the license plate regions. The segmented areas are then processed by multimodal large language models (MLLMs) for accurate license plate recognition. OneShotLP offers significant advantages, including the ability to function effectively without extensive training data and adaptability to various license plate styles. Experimental results on UFPR-ALPR and SSIG-SegPlate datasets demonstrate the superior accuracy of our approach compared to traditional methods. This highlights the potential of leveraging pre-trained models for diverse real-world applications in intelligent transportation systems. The code is available at https://github.com/Dinghaoxuan/OneShotLP.

A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

TL;DR

OneShotLP presents a training-free, video-based license plate analysis framework that starts from a single annotated LP center in the first frame. It combines a point-tracking module (CoTracker), a promptable segmentation module (EfficientSAM), and a multimodal large language model (MLLM) for recognition to achieve LP tracking and recognition without task-specific training. The approach demonstrates strong LP detection performance and competitive recognition on UFPR-ALPR and SSIG-SegPlate, highlighting the potential of leveraging foundation models for adaptable, region-agnostic LP analysis in intelligent transportation systems. The results underscore the practicality of zero-shot LP analysis with minimal supervision and indicate avenues for improving LP-specific recognition through prompt engineering and model fine-tuning.

Abstract

Traditional license plate detection and recognition models are often trained on closed datasets, limiting their ability to handle the diverse license plate formats across different regions. The emergence of large-scale pre-trained models has shown exceptional generalization capabilities, enabling few-shot and zero-shot learning. We propose OneShotLP, a training-free framework for video-based license plate detection and recognition, leveraging these advanced models. Starting with the license plate position in the first video frame, our method tracks this position across subsequent frames using a point tracking module, creating a trajectory of prompts. These prompts are input into a segmentation module that uses a promptable large segmentation model to generate local masks of the license plate regions. The segmented areas are then processed by multimodal large language models (MLLMs) for accurate license plate recognition. OneShotLP offers significant advantages, including the ability to function effectively without extensive training data and adaptability to various license plate styles. Experimental results on UFPR-ALPR and SSIG-SegPlate datasets demonstrate the superior accuracy of our approach compared to traditional methods. This highlights the potential of leveraging pre-trained models for diverse real-world applications in intelligent transportation systems. The code is available at https://github.com/Dinghaoxuan/OneShotLP.
Paper Structure (19 sections, 2 equations, 5 figures, 6 tables)

This paper contains 19 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The video clip with license plate information in transportation.
  • Figure 2: The pipeline of proposed OneShotLP.
  • Figure 3: The point selection strategy to generate query points in tracking module.
  • Figure 4: The input strategy for recognition module to achieve multimodal understanding and reasoning.
  • Figure 5: The visual question answering results for LP recognition.