Table of Contents
Fetching ...

VARS: Vision-based Assessment of Risk in Security Systems

Pranav Gupta, Pratham Gohil, Sridhar S

TL;DR

A comparative analysis of various machine learning and deep learning models to predict danger ratings in a custom dataset of 100 videos, each containing 50 frames, annotated with human-rated danger scores ranging from 0 to 10, to identify the most robust approach.

Abstract

The accurate prediction of danger levels in video content is critical for enhancing safety and security systems, particularly in environments where quick and reliable assessments are essential. In this study, we perform a comparative analysis of various machine learning and deep learning models to predict danger ratings in a custom dataset of 100 videos, each containing 50 frames, annotated with human-rated danger scores ranging from 0 to 10. The danger ratings are further classified into three categories: no alert (less than 7)and high alert (greater than equal to 7). Our evaluation covers classical machine learning models, such as Support Vector Machines, as well as Neural Networks, and transformer-based models. Model performance is assessed using standard metrics such as accuracy, F1-score, and mean absolute error (MAE), and the results are compared to identify the most robust approach. This research contributes to developing a more accurate and generalizable danger assessment framework for video-based risk detection.

VARS: Vision-based Assessment of Risk in Security Systems

TL;DR

A comparative analysis of various machine learning and deep learning models to predict danger ratings in a custom dataset of 100 videos, each containing 50 frames, annotated with human-rated danger scores ranging from 0 to 10, to identify the most robust approach.

Abstract

The accurate prediction of danger levels in video content is critical for enhancing safety and security systems, particularly in environments where quick and reliable assessments are essential. In this study, we perform a comparative analysis of various machine learning and deep learning models to predict danger ratings in a custom dataset of 100 videos, each containing 50 frames, annotated with human-rated danger scores ranging from 0 to 10. The danger ratings are further classified into three categories: no alert (less than 7)and high alert (greater than equal to 7). Our evaluation covers classical machine learning models, such as Support Vector Machines, as well as Neural Networks, and transformer-based models. Model performance is assessed using standard metrics such as accuracy, F1-score, and mean absolute error (MAE), and the results are compared to identify the most robust approach. This research contributes to developing a more accurate and generalizable danger assessment framework for video-based risk detection.

Paper Structure

This paper contains 14 sections, 4 figures.

Figures (4)

  • Figure 1: Concatenating CLIP and GPT Embeddings to classify a video if its high alert or not
  • Figure 2: Perfoming classification using video summaries' embeddings using SVMs and 10-fold cross-validation strategy
  • Figure 3: Concatenating CLIP and GPT Embeddings to classify a video if its high alert or not
  • Figure 4: Confusion matrix depicting the performance of this framework.