Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data
Rui She, Shanyun Liu, Yunquan Dong, Pingyi Fan
TL;DR
The paper addresses emphasizing rare events in big data by introducing a parametric Message Importance Measure (MIM) that can be tuned to focus on a specific probability element. It defines $L( extbf{p},\varpi)=\log\left(\sum_{i=1}^n p_i e^{\varpi(1-p_i)}\right)$ and proposes $\varpi_j=1/p_j$ to obtain $L_j(\textbf{p},\varpi_j)$, along with key properties such as a principal component effect, a chain rule ordering, and bounds. It extends the framework to incorporate prior probability, deriving solutions for $\varpi^*$ via $g(p,\varpi)=0$ and bounds $2/p_{\max} \le \varpi \le 2/p_{\min}$, enabling adaptive focusing under uncertainty. The authors additionally analyze availability in a binary minority-subset model, provide convergence analyses for empirical MIM, and present numerical results illustrating the method’s behavior for common distributions, highlighting its potential for minority-subset and anomaly detection in big data.
Abstract
Message importance measure (MIM) is applicable to characterize the importance of information in the scenario of big data, similar to entropy in information theory. In fact, MIM with a variable parameter can make an effect on the characterization of distribution. Furthermore, by choosing an appropriate parameter of MIM, it is possible to emphasize the message importance of a certain probability element in a distribution. Therefore, parametric MIM can play a vital role in anomaly detection of big data by focusing on probability of an anomalous event. In this paper, we propose a parameter selection method of MIM focusing on a probability element and then present its major properties. In addition, we discuss the parameter selection with prior probability, and investigate the availability in a statistical processing model of big data for anomaly detection problem.
