SkateboardAI: The Coolest Video Action Recognition for Skateboarding
Hanxiao Chen
TL;DR
The paper tackles automatic recognition of skateboarding tricks from real-world videos by introducing SkateboardAI, a dataset with 15 trick classes collected from diverse sources. It systematically compares uni-modal CNN-LSTM variants, attention-enhanced and Transformer-based pipelines, and a multi-modal I3D architecture for trick classification. The key finding is that a ResNet50-Attention-BiLSTM pipeline achieves about $84\%$ validation accuracy, while Transformer and I3D approaches underperform and require longer training times. The work demonstrates the feasibility of an AI sports referee in skateboarding, and sets the stage for dataset expansion and semi-/unsupervised learning future work.
Abstract
Impressed by the coolest skateboarding sports program from 2021 Tokyo Olympic Games, we are the first to curate the original real-world video datasets "SkateboardAI" in the wild, even self-design and implement diverse uni-modal and multi-modal video action recognition approaches to recognize different tricks accurately. For uni-modal methods, we separately apply (1) CNN and LSTM; (2) CNN and BiLSTM; (3) CNN and BiLSTM with effective attention mechanisms; (4) Transformer-based action recognition pipeline. Transferred to the multi-modal conditions, we investigated the two-stream Inflated-3D architecture on "SkateboardAI" datasets to compare its performance with uni-modal cases. In sum, our objective is developing an excellent AI sport referee for the coolest skateboarding competitions.
