A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage

Levi Harris

A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage

Levi Harris

TL;DR

This work intends to expedite the development of large, multi-modal video datasets to train data-hungry video models in the sports action recognition domain by aligning a pre-labeled corpus of play-by-play annotations containing dense event annotations to video frames, enabling quick retrieval of labeled video segments.

Abstract

We present a reliable temporal grounding pipeline for video-to-analytic alignment of basketball broadcast footage. Given a series of frames as input, our method quickly and accurately extracts time-remaining and quarter values from basketball broadcast scenes. Our work intends to expedite the development of large, multi-modal video datasets to train data-hungry video models in the sports action recognition domain. Our method aligns a pre-labeled corpus of play-by-play annotations containing dense event annotations to video frames, enabling quick retrieval of labeled video segments. Unlike previous methods, we forgo the need to localize game clocks by fine-tuning an out-of-the-box object detector to find semantic text regions directly. Our end-to-end approach improves the generality of our work. Additionally, interpolation and parallelization techniques prepare our pipeline for deployment in a large computing cluster. All code is made publicly available.

A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage

TL;DR

Abstract

A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)