Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model
Till Grutschus, Ola Karrar, Emir Esenov, Ekta Vats
TL;DR
This paper tackles human fall detection in untrimmed videos by leveraging a large video understanding foundation model (VideoMAEv2 ViT-B) instead of bespoke architectures. It introduces a simple cutup-based temporal action localization pipeline, along with Gaussian sampling, to convert timestamped videos into labeled short clips for training, with a priority labeling scheme for Fall/Lying/ADL. The Gaussian sampling uses seeds $t_i$ drawn from $t_i \sim \mathcal{N}(t_{Fall}, \frac{1}{3}\min\{t_{Fall}, T - t_{Fall}\})$ around the fall midpoint, combined with a clip length parameter $T_{clip}$. On HQFSD, the approach achieves a state-of-the-art video-level F1 score of $0.96$ under the given settings, demonstrating real-time applicability; code and pretrained models will be released on GitHub.
Abstract
This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video and leverages a pretrained vision transformer for multi-class action detection, with classes: "Fall", "Lying" and "Other/Activities of daily living (ADL)". A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated. The methodology includes a preprocessing pipeline that converts datasets with timestamp action annotations into labeled datasets of short action clips. Simple and effective clip-sampling strategies are introduced. The effectiveness of the proposed method has been empirically evaluated on the publicly available High-Quality Fall Simulation Dataset (HQFSD). The experimental results validate the performance of the proposed pipeline. The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset under the given experimental settings. The source code will be made available on GitHub.
