Table of Contents
Fetching ...

A Comprehensive Survey on Rare Event Prediction

Chathurangi Shyalika, Ruwan Wickramarachchi, Amit Sheth

TL;DR

This article comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches and suggests potential research directions, which can help guide practitioners and researchers.

Abstract

Rare event prediction involves identifying and forecasting events with a low probability using machine learning (ML) and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the ML pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and ML. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.

A Comprehensive Survey on Rare Event Prediction

TL;DR

This article comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches and suggests potential research directions, which can help guide practitioners and researchers.

Abstract

Rare event prediction involves identifying and forecasting events with a low probability using machine learning (ML) and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the ML pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and ML. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.
Paper Structure (62 sections, 11 figures, 11 tables)

This paper contains 62 sections, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Approaches to learning from rare event data
  • Figure 2: Levels of rarity
  • Figure 3: Relationship between rare event data, acquisition methods, rarity factors, characteristics, and challenges of rare event datasets
  • Figure 4: Data processing approaches in rare event research
  • Figure 5: Association between data cleaning approaches, data modalities, rarity groups and downstream tasks * Coloring of data cleaning approaches corresponds to the data modalities
  • ...and 6 more figures