Importance of Small Probability Events in Big Data: Information Measures, Applications, and Challenges
Rui She, Shanyun Liu, Shuo Wan, Ke Xiong, Pingyi Fan
TL;DR
The paper tackles the problem of extracting and leveraging information from rare, small-probability events in big IoT data, arguing that minority subsets can carry disproportionate value for anomaly detection, security, and safety. It introduces a family of Message Importance Measures (MIM, fixed-parameter MIM, NMIM) and a unifying form $\mathcal{L}(\mathbf{p})=\log\sum_i \mathcal{V}(p_i)$ to quantify the importance of rare events, supplemented by practical information-processing architectures for compression, transmission, and preprocessing. It then discusses applications to data analytics in IoTs, including efficient estimation of information measures, dimension reduction via information coupling, directed information for causal analysis, and probability-derivation-based rare-event detection, with formulas such as $L(\mathbf{p},\varpi)$, $L_j(\mathbf{p},\varpi_j)$, and $L_{non}(\mathbf{p})$ guiding the methodology. The paper further outlines future challenges across smart cities, autonomous driving, and IoT detection, highlighting data-storage constraints, latency, feature extraction, estimation efficiency, and decision-making strategies for rare-event mining, aiming to inform both theory and practice. Overall, the work lays a theoretical and architectural foundation for prioritizing and exploiting rare events in large-scale IoT data, with potential practical impact on security, transportation, and urban management.
Abstract
In many applications (e.g., anomaly detection and security systems) of smart cities, rare events dominate the importance of the total information of big data collected by Internet of Things (IoTs). That is, it is pretty crucial to explore the valuable information associated with the rare events involved in minority subsets of the voluminous amounts of data. To do so, how to effectively measure the information with importance of the small probability events from the perspective of information theory is a fundamental question. This paper first makes a survey of some theories and models with respect to importance measures and investigates the relationship between subjective or semantic importance and rare events in big data. Moreover, some applications for message processing and data analysis are discussed in the viewpoint of information measures. In addition, based on rare events detection, some open challenges related to information measures, such as smart cities, autonomous driving, and anomaly detection in IoTs, are introduced which can be considered as future research directions.
