Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data

Rui She; Shanyun Liu; Yunquan Dong; Pingyi Fan

Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data

Rui She, Shanyun Liu, Yunquan Dong, Pingyi Fan

TL;DR

The paper addresses emphasizing rare events in big data by introducing a parametric Message Importance Measure (MIM) that can be tuned to focus on a specific probability element. It defines $L( extbf{p},\varpi)=\log\left(\sum_{i=1}^n p_i e^{\varpi(1-p_i)}\right)$ and proposes $\varpi_j=1/p_j$ to obtain $L_j(\textbf{p},\varpi_j)$, along with key properties such as a principal component effect, a chain rule ordering, and bounds. It extends the framework to incorporate prior probability, deriving solutions for $\varpi^*$ via $g(p,\varpi)=0$ and bounds $2/p_{\max} \le \varpi \le 2/p_{\min}$, enabling adaptive focusing under uncertainty. The authors additionally analyze availability in a binary minority-subset model, provide convergence analyses for empirical MIM, and present numerical results illustrating the method’s behavior for common distributions, highlighting its potential for minority-subset and anomaly detection in big data.

Abstract

Message importance measure (MIM) is applicable to characterize the importance of information in the scenario of big data, similar to entropy in information theory. In fact, MIM with a variable parameter can make an effect on the characterization of distribution. Furthermore, by choosing an appropriate parameter of MIM, it is possible to emphasize the message importance of a certain probability element in a distribution. Therefore, parametric MIM can play a vital role in anomaly detection of big data by focusing on probability of an anomalous event. In this paper, we propose a parameter selection method of MIM focusing on a probability element and then present its major properties. In addition, we discuss the parameter selection with prior probability, and investigate the availability in a statistical processing model of big data for anomaly detection problem.

Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data

TL;DR

The paper addresses emphasizing rare events in big data by introducing a parametric Message Importance Measure (MIM) that can be tuned to focus on a specific probability element. It defines

and proposes

to obtain

, along with key properties such as a principal component effect, a chain rule ordering, and bounds. It extends the framework to incorporate prior probability, deriving solutions for

via

and bounds

, enabling adaptive focusing under uncertainty. The authors additionally analyze availability in a binary minority-subset model, provide convergence analyses for empirical MIM, and present numerical results illustrating the method’s behavior for common distributions, highlighting its potential for minority-subset and anomaly detection in big data.

Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data

TL;DR

Abstract

Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)

Theorems & Definitions (3)