A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Murad Hasan; Shahriar Iqbal; Md. Billal Hossain Faisal; Md. Musnad Hossin Neloy; Md. Tonmoy Kabir; Md. Tanzim Reza; Md. Golam Rabiul Alam; Md Zia Uddin

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Murad Hasan, Shahriar Iqbal, Md. Billal Hossain Faisal, Md. Musnad Hossin Neloy, Md. Tonmoy Kabir, Md. Tanzim Reza, Md. Golam Rabiul Alam, Md Zia Uddin

TL;DR

The paper tackles physical stalking detection in public spaces using video analysis. It introduces a CNN-LSTM-MLP hybrid fusion model that merges ConvLSTM-based spatiotemporal features with an MLP processing numerical facial features (facial landmarks, head pose, and relative distance) to classify stalking versus non-stalking from a small set of frames. A new single-video stalking dataset is created, comprising 238 videos (117 stalking, 121 non-stalking) sourced from feature films and TV, with frames trimmed to 3–8 seconds and annotated by five human raters. The proposed fusion approach achieves 89.58% testing accuracy, outperforming CNN- and ConvLSTM-based baselines and demonstrating the value of integrating facial-feature cues into video-based stalking detection for potential surveillance applications.

Abstract

Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches.

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

TL;DR

Abstract

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Authors

TL;DR

Abstract

Table of Contents

Figures (10)