Table of Contents
Fetching ...

Importance of Small Probability Events in Big Data: Information Measures, Applications, and Challenges

Rui She, Shanyun Liu, Shuo Wan, Ke Xiong, Pingyi Fan

TL;DR

The paper tackles the problem of extracting and leveraging information from rare, small-probability events in big IoT data, arguing that minority subsets can carry disproportionate value for anomaly detection, security, and safety. It introduces a family of Message Importance Measures (MIM, fixed-parameter MIM, NMIM) and a unifying form $\mathcal{L}(\mathbf{p})=\log\sum_i \mathcal{V}(p_i)$ to quantify the importance of rare events, supplemented by practical information-processing architectures for compression, transmission, and preprocessing. It then discusses applications to data analytics in IoTs, including efficient estimation of information measures, dimension reduction via information coupling, directed information for causal analysis, and probability-derivation-based rare-event detection, with formulas such as $L(\mathbf{p},\varpi)$, $L_j(\mathbf{p},\varpi_j)$, and $L_{non}(\mathbf{p})$ guiding the methodology. The paper further outlines future challenges across smart cities, autonomous driving, and IoT detection, highlighting data-storage constraints, latency, feature extraction, estimation efficiency, and decision-making strategies for rare-event mining, aiming to inform both theory and practice. Overall, the work lays a theoretical and architectural foundation for prioritizing and exploiting rare events in large-scale IoT data, with potential practical impact on security, transportation, and urban management.

Abstract

In many applications (e.g., anomaly detection and security systems) of smart cities, rare events dominate the importance of the total information of big data collected by Internet of Things (IoTs). That is, it is pretty crucial to explore the valuable information associated with the rare events involved in minority subsets of the voluminous amounts of data. To do so, how to effectively measure the information with importance of the small probability events from the perspective of information theory is a fundamental question. This paper first makes a survey of some theories and models with respect to importance measures and investigates the relationship between subjective or semantic importance and rare events in big data. Moreover, some applications for message processing and data analysis are discussed in the viewpoint of information measures. In addition, based on rare events detection, some open challenges related to information measures, such as smart cities, autonomous driving, and anomaly detection in IoTs, are introduced which can be considered as future research directions.

Importance of Small Probability Events in Big Data: Information Measures, Applications, and Challenges

TL;DR

The paper tackles the problem of extracting and leveraging information from rare, small-probability events in big IoT data, arguing that minority subsets can carry disproportionate value for anomaly detection, security, and safety. It introduces a family of Message Importance Measures (MIM, fixed-parameter MIM, NMIM) and a unifying form to quantify the importance of rare events, supplemented by practical information-processing architectures for compression, transmission, and preprocessing. It then discusses applications to data analytics in IoTs, including efficient estimation of information measures, dimension reduction via information coupling, directed information for causal analysis, and probability-derivation-based rare-event detection, with formulas such as , , and guiding the methodology. The paper further outlines future challenges across smart cities, autonomous driving, and IoT detection, highlighting data-storage constraints, latency, feature extraction, estimation efficiency, and decision-making strategies for rare-event mining, aiming to inform both theory and practice. Overall, the work lays a theoretical and architectural foundation for prioritizing and exploiting rare events in large-scale IoT data, with potential practical impact on security, transportation, and urban management.

Abstract

In many applications (e.g., anomaly detection and security systems) of smart cities, rare events dominate the importance of the total information of big data collected by Internet of Things (IoTs). That is, it is pretty crucial to explore the valuable information associated with the rare events involved in minority subsets of the voluminous amounts of data. To do so, how to effectively measure the information with importance of the small probability events from the perspective of information theory is a fundamental question. This paper first makes a survey of some theories and models with respect to importance measures and investigates the relationship between subjective or semantic importance and rare events in big data. Moreover, some applications for message processing and data analysis are discussed in the viewpoint of information measures. In addition, based on rare events detection, some open challenges related to information measures, such as smart cities, autonomous driving, and anomaly detection in IoTs, are introduced which can be considered as future research directions.

Paper Structure

This paper contains 18 sections, 15 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: The interpretation of information importance measures focusing on rare events.
  • Figure 2: The comparison of different message importance measures with respect to the Bernoulli distribution ($p$, $1-p$).
  • Figure 3: Message processing architecture from the viewpoint of rare events.
  • Figure 4: Comparison for different information divergences (between the probability distributions $P$ and $Q$ where $P=(p,1-p)$ and $Q=(0.4,0.6)$) including the MI divergence (with the parameter $\varpi = 2,1,0.8$), KL divergence and squared Euclidean distance.
  • Figure 5: Architecture of data analytics based on message importance of rare events.
  • ...and 5 more figures