Table of Contents
Fetching ...

Relaxed syntax modeling in Transformers for future-proof license plate recognition

Florent Meyer, Laurent Guichard, Denis Coquenet, Guillaume Gravier, Yann Soullard, Bertrand Coüasnon

TL;DR

The paper investigates how Transformer-based license plate recognition systems overfit to training-time syntax and fail when plate syntax evolves. It proposes SaLT, a syntax-less Transformer that combines a fully convolutional encoder with a debiased two-layer decoder and revised cross-attention to minimize positional and contextual biases. Empirical results on real and synthetic LPR data show SaLT achieves strong accuracy on both source and target syntax, with markedly lower variability, illustrating robust, future-proof recognition without retraining. Ablation studies confirm the effectiveness of the encoder and decoder modifications and demonstrate generalization to syntax shifts across positions, offering a practical path toward production-ready LPR in dynamic environments.

Abstract

Effective license plate recognition systems are required to be resilient to constant change, as new license plates are released into traffic daily. While Transformer-based networks excel in their recognition at first sight, we observe significant performance drop over time which proves them unsuitable for tense production environments. Indeed, such systems obtain state-of-the-art results on plates whose syntax is seen during training. Yet, we show they perform similarly to random guessing on future plates where legible characters are wrongly recognized due to a shift in their syntax. After highlighting the flows of positional and contextual information in Transformer encoder-decoders, we identify several causes for their over-reliance on past syntax. Following, we devise architectural cut-offs and replacements which we integrate into SaLT, an attempt at a Syntax-Less Transformer for syntax-agnostic modeling of license plate representations. Experiments on both real and synthetic datasets show that our approach reaches top accuracy on past syntax and most importantly nearly maintains performance on future license plates. We further demonstrate the robustness of our architecture enhancements by way of various ablations.

Relaxed syntax modeling in Transformers for future-proof license plate recognition

TL;DR

The paper investigates how Transformer-based license plate recognition systems overfit to training-time syntax and fail when plate syntax evolves. It proposes SaLT, a syntax-less Transformer that combines a fully convolutional encoder with a debiased two-layer decoder and revised cross-attention to minimize positional and contextual biases. Empirical results on real and synthetic LPR data show SaLT achieves strong accuracy on both source and target syntax, with markedly lower variability, illustrating robust, future-proof recognition without retraining. Ablation studies confirm the effectiveness of the encoder and decoder modifications and demonstrate generalization to syntax shifts across positions, offering a practical path toward production-ready LPR in dynamic environments.

Abstract

Effective license plate recognition systems are required to be resilient to constant change, as new license plates are released into traffic daily. While Transformer-based networks excel in their recognition at first sight, we observe significant performance drop over time which proves them unsuitable for tense production environments. Indeed, such systems obtain state-of-the-art results on plates whose syntax is seen during training. Yet, we show they perform similarly to random guessing on future plates where legible characters are wrongly recognized due to a shift in their syntax. After highlighting the flows of positional and contextual information in Transformer encoder-decoders, we identify several causes for their over-reliance on past syntax. Following, we devise architectural cut-offs and replacements which we integrate into SaLT, an attempt at a Syntax-Less Transformer for syntax-agnostic modeling of license plate representations. Experiments on both real and synthetic datasets show that our approach reaches top accuracy on past syntax and most importantly nearly maintains performance on future license plates. We further demonstrate the robustness of our architecture enhancements by way of various ablations.

Paper Structure

This paper contains 36 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Cropped photographs of real test LP with predictions by TrOCR.Left: Fine-tuned TrOCR successfully decodes images of LP starting with letters from A to F, (training-time syntax), despite degraded quality. Right: Yet, it fails on legible LP with a G in the leftmost position (future syntax). Errors (underlined) occur mainly in the first and second positions. LP are anonymised for RGPD compliance.
  • Figure 2: Overall debiasing framework applied to a Transformer encoder-decoder network. The proposed architectural enhancements discard positional and contextual bias through an accurate control of information flows.
  • Figure 3: Example LPR-MNIST target syntax samples with random padding.