One-Shot Badminton Shuttle Detection for Mobile Robots

Florentin Dipner; William Talbot; Turcan Tuna; Andrei Cramariuc; Marco Hutter

One-Shot Badminton Shuttle Detection for Mobile Robots

Florentin Dipner, William Talbot, Turcan Tuna, Andrei Cramariuc, Marco Hutter

TL;DR

A robust one-shot badminton shuttlecock detection framework for non-stationary robots, specifically designed for the egocentric, dynamic viewpoints of mobile robots, providing a foundational building block for downstream tasks, including tracking, trajectory estimation, and system (re)-initialization.

Abstract

This paper presents a robust one-shot badminton shuttlecock detection framework for non-stationary robots. To address the lack of egocentric shuttlecock detection datasets, we introduce a dataset of 20,510 semi-automatically annotated frames captured across 11 distinct backgrounds in diverse indoor and outdoor environments, and categorize each frame into one of three difficulty levels. For labeling, we present a novel semi-automatic annotation pipeline, that enables efficient labeling from stationary camera footage. We propose a metric suited to our downstream use case and fine-tune a YOLOv8 network optimized for real-time shuttlecock detection, achieving an F1-score of 0.86 under our metric in test environments similar to training, and 0.70 in entirely unseen environments. Our analysis reveals that detection performance is critically dependent on shuttlecock size and background texture complexity. Qualitative experiments confirm their applicability to robots with moving cameras. Unlike prior work with stationary camera setups, our detector is specifically designed for the egocentric, dynamic viewpoints of mobile robots, providing a foundational building block for downstream tasks, including tracking, trajectory estimation, and system (re)-initialization.

One-Shot Badminton Shuttle Detection for Mobile Robots

TL;DR

Abstract

Paper Structure (13 sections, 6 figures, 2 tables)

This paper contains 13 sections, 6 figures, 2 tables.

INTRODUCTION
RELATED WORK
Small Ball Sport Multi-Frame Trackers
Small Ball Sport Tracking-by-Detection
METHODOLOGY
Dataset
Metric
Training Setup
EVALUATION
Quantitative Results with Stationary Camera
Error Analysis
Qualitative Results with Moving Camera
CONCLUSIONS

Figures (6)

Figure 1: Example detections of the fine-tuned model. Our method reliably detects shuttlecocks, even under challenging conditions.
Figure 2: Dataset Overview and Difficulty Distribution: Representative frames from 11 backgrounds across 5 locations. Each frame shows the per-background difficulty distribution (top-right inset) with cropped shuttlecock examples (lower-right inset), both color-coded by difficulty: green (easy), orange (medium), red (hard). Sample sizes range from 642 to 3,407 frames per background. Bottom-right: Overall distribution across all 20,510 frames.
Figure 3: Automated Shuttlecock Labeling Pipeline: Left: gmm-based background segmentation identifies foreground regions (red). Center: YOLOv8-seg detects and segments the opponent player (blue), whose region is subsequently excluded from shuttlecock candidates. Right: Final detection result after applying morphological operations, person removal, and spatial constraints (lower region shown in green represents the excluded zone), with the detected shuttlecock marked by a green bounding box.
Figure 4: Cross-Validation Results Distribution: Performance metrics across individual backgrounds (left/green) and locations (right/blue). Each circle represents model performance when trained on all subsets except one and evaluated on the held-out subset. Background-based validation indicates how well the model generalizes to environments similar to those in the training set, while location-based validation indicates generalization to previously unseen environments. Diamonds indicate mean performance across all folds.
Figure 5: Distribution of Bounding Box Sizes and Corresponding Model Performance: The histogram shows the distribution of correct (green, n=15,561) and incorrect (red, n=4,915) predictions across different bounding box side lengths measured in input pixels and defined as the geometric mean $\sqrt{w \cdot h}$. The blue line represents model precision, the violet line represents recall (both calculated for bins with at least 50 samples), plotted on the secondary y-axis on the right.
...and 1 more figures

One-Shot Badminton Shuttle Detection for Mobile Robots

TL;DR

Abstract

One-Shot Badminton Shuttle Detection for Mobile Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (6)