Table of Contents
Fetching ...

Fight Scene Detection for Movie Highlight Generation System

Aryan Mathur

TL;DR

The paper tackles automatic Fight Scene Detection (FSD) to streamline Movie Highlight Generation by identifying violent sequences in cinematic footage. It proposes a BiLSTM-based model that processes 16-frame sequences with TimeDistributed MobileNet features, capturing temporal dependencies to predict per-frame fight labels via a softmax output; the underlying formulation includes $h_t=[h_t^f; h_t^b]$ and $y_t=softmax(W_o h_t + b_o)$. On a violence/non-violence video dataset, the model achieves 93.5% accuracy, outperforming a 2D-CNN+Hough Forests baseline (92%) and a 3D-CNN baseline (65%). This enables efficient, scalable generation of movie highlights and could extend to security and content filtering, with potential for GUI-based deployment via Gradio.

Abstract

In this paper of a research based project, using Bidirectional Long Short-Term Memory (BiLSTM) networks, we provide a novel Fight Scene Detection (FSD) model which can be used for Movie Highlight Generation Systems (MHGS) based on deep learning and Neural Networks . Movies usually have Fight Scenes to keep the audience amazed. For trailer generation, or any other application of Highlight generation, it is very tidious to first identify all such scenes manually and then compile them to generate a highlight serving the purpose. Our proposed FSD system utilises temporal characteristics of the movie scenes and thus is capable to automatically identify fight scenes. Thereby helping in the effective production of captivating movie highlights. We observe that the proposed solution features 93.5% accuracy and is higher than 2D CNN with Hough Forests which being 92% accurate and is significantly higher than 3D CNN which features an accuracy of 65%.

Fight Scene Detection for Movie Highlight Generation System

TL;DR

The paper tackles automatic Fight Scene Detection (FSD) to streamline Movie Highlight Generation by identifying violent sequences in cinematic footage. It proposes a BiLSTM-based model that processes 16-frame sequences with TimeDistributed MobileNet features, capturing temporal dependencies to predict per-frame fight labels via a softmax output; the underlying formulation includes and . On a violence/non-violence video dataset, the model achieves 93.5% accuracy, outperforming a 2D-CNN+Hough Forests baseline (92%) and a 3D-CNN baseline (65%). This enables efficient, scalable generation of movie highlights and could extend to security and content filtering, with potential for GUI-based deployment via Gradio.

Abstract

In this paper of a research based project, using Bidirectional Long Short-Term Memory (BiLSTM) networks, we provide a novel Fight Scene Detection (FSD) model which can be used for Movie Highlight Generation Systems (MHGS) based on deep learning and Neural Networks . Movies usually have Fight Scenes to keep the audience amazed. For trailer generation, or any other application of Highlight generation, it is very tidious to first identify all such scenes manually and then compile them to generate a highlight serving the purpose. Our proposed FSD system utilises temporal characteristics of the movie scenes and thus is capable to automatically identify fight scenes. Thereby helping in the effective production of captivating movie highlights. We observe that the proposed solution features 93.5% accuracy and is higher than 2D CNN with Hough Forests which being 92% accurate and is significantly higher than 3D CNN which features an accuracy of 65%.
Paper Structure (21 sections, 4 equations, 7 figures, 3 tables)