Table of Contents
Fetching ...

Annotation Techniques for Judo Combat Phase Classification from Tournament Footage

Anthony Miyaguchi, Jed Moutahir, Tanmay Sutar

TL;DR

This work tackles automated annotation and combat-phase classification in fixed-angle judo tournament footage under limited labeled data. It introduces a semi-supervised pipeline that transfers knowledge from a fine-tuned object detector to classify match presence, activity, and standing states, using a combination of frame-level labeling, OCR timer cues, and embedding-based classifiers with and without temporal context. The approach is validated on a dataset of 19 thirty-second clips, achieving competitive F1 scores and demonstrating the feasibility of automated match segmentation and phase analysis in semi-supervised settings. The study lays groundwork for scalable, automated retrieval of highlights, statistics, and strategic insights from judo broadcasts, with multiple avenues for future multimodal and technique-level extensions.

Abstract

This paper presents a semi-supervised approach to extracting and analyzing combat phases in judo tournaments using live-streamed footage. The objective is to automate the annotation and summarization of live streamed judo matches. We train models that extract relevant entities and classify combat phases from fixed-perspective judo recordings. We employ semi-supervised methods to address limited labeled data in the domain. We build a model of combat phases via transfer learning from a fine-tuned object detector to classify the presence, activity, and standing state of the match. We evaluate our approach on a dataset of 19 thirty-second judo clips, achieving an F1 score on a $20\%$ test hold-out of 0.66, 0.78, and 0.87 for the three classes, respectively. Our results show initial promise for automating more complex information retrieval tasks using rigorous methods with limited labeled data.

Annotation Techniques for Judo Combat Phase Classification from Tournament Footage

TL;DR

This work tackles automated annotation and combat-phase classification in fixed-angle judo tournament footage under limited labeled data. It introduces a semi-supervised pipeline that transfers knowledge from a fine-tuned object detector to classify match presence, activity, and standing states, using a combination of frame-level labeling, OCR timer cues, and embedding-based classifiers with and without temporal context. The approach is validated on a dataset of 19 thirty-second clips, achieving competitive F1 scores and demonstrating the feasibility of automated match segmentation and phase analysis in semi-supervised settings. The study lays groundwork for scalable, automated retrieval of highlights, statistics, and strategic insights from judo broadcasts, with multiple avenues for future multimodal and technique-level extensions.

Abstract

This paper presents a semi-supervised approach to extracting and analyzing combat phases in judo tournaments using live-streamed footage. The objective is to automate the annotation and summarization of live streamed judo matches. We train models that extract relevant entities and classify combat phases from fixed-perspective judo recordings. We employ semi-supervised methods to address limited labeled data in the domain. We build a model of combat phases via transfer learning from a fine-tuned object detector to classify the presence, activity, and standing state of the match. We evaluate our approach on a dataset of 19 thirty-second judo clips, achieving an F1 score on a test hold-out of 0.66, 0.78, and 0.87 for the three classes, respectively. Our results show initial promise for automating more complex information retrieval tasks using rigorous methods with limited labeled data.

Paper Structure

This paper contains 19 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: A state diagram of the active portions of Judo combat. The timer is actively running and is delineated by calls from the referee. Most active combat occurs during the standing portion of the match (tachiwaza), where players attempt to unbalance and throw each other. The match may continue to the ground (newaza) if a throw is not decisive. The state diagram is missing matches that end due to disqualification, such as executing a banned or dangerous technique.
  • Figure 2: The bounding box of each player and referee is pre-annotated using rules derived from an object detector in a Label Studio project. Human annotators manually correct and validate the results.
  • Figure 3: Match timing information from the overlay is extracted using Tesseract OCR smith2007overview. Gaps from invalid readings are interpolated. The derivative of the timer is computed and plotted. When the derivative is 0, the timer is paused; when it is -1, it runs. In this match, the timer was paused nine times: at the beginning, seven pauses by the referee, and end of the match.
  • Figure 4: The distribution of full-scene match classes in the inference on the training set. Mats 2 and 8 do not have a video overlay.
  • Figure 5: The results of applying fine-tuned YOLOv8 to two different mats sampled at the same ten-minute interval at 1 fps. Note that Mats 2 and 10 are raw camera footage due to technical issues with the stream.