Fight Scene Detection for Movie Highlight Generation System

Aryan Mathur

Fight Scene Detection for Movie Highlight Generation System

Aryan Mathur

TL;DR

The paper tackles automatic Fight Scene Detection (FSD) to streamline Movie Highlight Generation by identifying violent sequences in cinematic footage. It proposes a BiLSTM-based model that processes 16-frame sequences with TimeDistributed MobileNet features, capturing temporal dependencies to predict per-frame fight labels via a softmax output; the underlying formulation includes $h_t=[h_t^f; h_t^b]$ and $y_t=softmax(W_o h_t + b_o)$. On a violence/non-violence video dataset, the model achieves 93.5% accuracy, outperforming a 2D-CNN+Hough Forests baseline (92%) and a 3D-CNN baseline (65%). This enables efficient, scalable generation of movie highlights and could extend to security and content filtering, with potential for GUI-based deployment via Gradio.

Abstract

In this paper of a research based project, using Bidirectional Long Short-Term Memory (BiLSTM) networks, we provide a novel Fight Scene Detection (FSD) model which can be used for Movie Highlight Generation Systems (MHGS) based on deep learning and Neural Networks . Movies usually have Fight Scenes to keep the audience amazed. For trailer generation, or any other application of Highlight generation, it is very tidious to first identify all such scenes manually and then compile them to generate a highlight serving the purpose. Our proposed FSD system utilises temporal characteristics of the movie scenes and thus is capable to automatically identify fight scenes. Thereby helping in the effective production of captivating movie highlights. We observe that the proposed solution features 93.5% accuracy and is higher than 2D CNN with Hough Forests which being 92% accurate and is significantly higher than 3D CNN which features an accuracy of 65%.

Fight Scene Detection for Movie Highlight Generation System

TL;DR

and

. On a violence/non-violence video dataset, the model achieves 93.5% accuracy, outperforming a 2D-CNN+Hough Forests baseline (92%) and a 3D-CNN baseline (65%). This enables efficient, scalable generation of movie highlights and could extend to security and content filtering, with potential for GUI-based deployment via Gradio.

Abstract

Paper Structure (21 sections, 4 equations, 7 figures, 3 tables)

This paper contains 21 sections, 4 equations, 7 figures, 3 tables.

Introduction
Significance of the Project
Survey of Existing methods and technologies
Problems and Statements
Motivation
Major Objectives with Work Plan
Materials, Methods and Modules
System Architecture with Description
Training
System Specification Table
Description of Sensors or Other Modules Related to the Project
Block Diagram or Flowchart with Description
Mathematical Expressions Related to the Project Tasks
User Interface Related to Project Tasks
Results and Discussions
...and 6 more sections

Figures (7)

Figure 1: System Architecture
Figure 2: FlowChart
Figure 3: GUI- Dark Mode
Figure 4: GUI- Light Mode
Figure 5: Confusion Matrix
...and 2 more figures

Fight Scene Detection for Movie Highlight Generation System

TL;DR

Abstract

Fight Scene Detection for Movie Highlight Generation System

Authors

TL;DR

Abstract

Table of Contents

Figures (7)