Table of Contents
Fetching ...

Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing

Jacob Tyo, Motolani Olarinre, Youngseog Chung, Zachary C. Lipton

TL;DR

This paper introduces two real-world datasets, RnD and MUDD, to benchmark text spotting and person re-identification in off-road motorcycle racing under mud, occlusion, and motion blur. It demonstrates substantial domain gaps with off-the-shelf models and shows notable gains from domain-specific fine-tuning, yet reveals persistent challenges in extreme mud and varied poses. By providing detailed baselines, occlusion analyses, and qualitative failure modes, the work highlights key areas for methodological advances and offers a resource to drive robust computer vision in sports analytics and real-time photo search. The datasets and analyses aim to spur domain-targeted techniques that improve OCR and ReID under unconstrained, real-world conditions relevant to motorsports and beyond.

Abstract

Despite significant progress in optical character recognition (OCR) and computer vision systems, robustly recognizing text and identifying people in images taken in unconstrained \emph{in-the-wild} environments remain an ongoing challenge. However, such obstacles must be overcome in practical applications of vision systems, such as identifying racers in photos taken during off-road racing events. To this end, we introduce two new challenging real-world datasets - the off-road motorcycle Racer Number Dataset (RND) and the Muddy Racer re-iDentification Dataset (MUDD) - to highlight the shortcomings of current methods and drive advances in OCR and person re-identification (ReID) under extreme conditions. These two datasets feature over 6,300 images taken during off-road competitions which exhibit a variety of factors that undermine even modern vision systems, namely mud, complex poses, and motion blur. We establish benchmark performance on both datasets using state-of-the-art models. Off-the-shelf models transfer poorly, reaching only 15% end-to-end (E2E) F1 score on text spotting, and 33% rank-1 accuracy on ReID. Fine-tuning yields major improvements, bringing model performance to 53% F1 score for E2E text spotting and 79% rank-1 accuracy on ReID, but still falls short of good performance. Our analysis exposes open problems in real-world OCR and ReID that necessitate domain-targeted techniques. With these datasets and analysis of model limitations, we aim to foster innovations in handling real-world conditions like mud and complex poses to drive progress in robust computer vision. All data was sourced from PerformancePhoto.co, a website used by professional motorsports photographers, racers, and fans. The top-performing text spotting and ReID models are deployed on this platform to power real-time race photo search.

Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing

TL;DR

This paper introduces two real-world datasets, RnD and MUDD, to benchmark text spotting and person re-identification in off-road motorcycle racing under mud, occlusion, and motion blur. It demonstrates substantial domain gaps with off-the-shelf models and shows notable gains from domain-specific fine-tuning, yet reveals persistent challenges in extreme mud and varied poses. By providing detailed baselines, occlusion analyses, and qualitative failure modes, the work highlights key areas for methodological advances and offers a resource to drive robust computer vision in sports analytics and real-time photo search. The datasets and analyses aim to spur domain-targeted techniques that improve OCR and ReID under unconstrained, real-world conditions relevant to motorsports and beyond.

Abstract

Despite significant progress in optical character recognition (OCR) and computer vision systems, robustly recognizing text and identifying people in images taken in unconstrained \emph{in-the-wild} environments remain an ongoing challenge. However, such obstacles must be overcome in practical applications of vision systems, such as identifying racers in photos taken during off-road racing events. To this end, we introduce two new challenging real-world datasets - the off-road motorcycle Racer Number Dataset (RND) and the Muddy Racer re-iDentification Dataset (MUDD) - to highlight the shortcomings of current methods and drive advances in OCR and person re-identification (ReID) under extreme conditions. These two datasets feature over 6,300 images taken during off-road competitions which exhibit a variety of factors that undermine even modern vision systems, namely mud, complex poses, and motion blur. We establish benchmark performance on both datasets using state-of-the-art models. Off-the-shelf models transfer poorly, reaching only 15% end-to-end (E2E) F1 score on text spotting, and 33% rank-1 accuracy on ReID. Fine-tuning yields major improvements, bringing model performance to 53% F1 score for E2E text spotting and 79% rank-1 accuracy on ReID, but still falls short of good performance. Our analysis exposes open problems in real-world OCR and ReID that necessitate domain-targeted techniques. With these datasets and analysis of model limitations, we aim to foster innovations in handling real-world conditions like mud and complex poses to drive progress in robust computer vision. All data was sourced from PerformancePhoto.co, a website used by professional motorsports photographers, racers, and fans. The top-performing text spotting and ReID models are deployed on this platform to power real-time race photo search.
Paper Structure (23 sections, 16 figures, 6 tables)

This paper contains 23 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Beyond the Mud Dataset Examples: (a) Racers can have multiple, non-matching, numbers, (b) glare renders some numbers impossible to read, (c) a crashing racer, (d) occlusions from vegetation, (e) an extreme example of mud, (f) a stereotypical amount of mud.
  • Figure 2: Analysis of model performance on mud occluded numbers. (a) The model correctly recognizes the front number by ignoring mud. (b) The quad number is recognized but the muddy helmet number is missed. (c) The front number is read but a very muddy helmet number is missed. (d) The number is detected but misrecognized due to its odd position. (e) Two numbers are correctly read but the muddy side number is missed.
  • Figure 3: Example showcasing off-the-shelf (top image) vs fine-tuned (bottom image) model predictions in rainy conditions.
  • Figure 4: Example of successful re-id by the fine-tuned model under light mud occlusion. All top 10 ranked results correctly match the query rider despite the mud, blurring, lighting, pose, and complex backgrounds. Green boundaries signify correct matches and red incorrect.
  • Figure 5: Failure case with heavy mud occlusion on the query image. Only 1 out of the top 10 results is a correct match, despite over 20 images of the same rider appearing in the gallery set, most of which are clean. Green boundaries signify correct matches and red incorrect.
  • ...and 11 more figures