Table of Contents
Fetching ...

Amplifying Inter-message Distance: On Information Divergence Measures in Big Data

Rui She, Shanyun Liu, Pingyi Fan

TL;DR

The paper introduces Message Identification Divergence (M-I divergence), a parametric information-distance measure $D_{\varpi}(P\parallel Q)$ that amplifies the distance between similar distributions while preserving gaps between distinct ones. It provides fundamental properties (non-negativity, monotonicity in $\varpi$, convexity, and inequality relations with KL and Renyi) and develops practical estimation methods: a multidimensional discrete kernel estimator with a weight window and a weighted ensemble estimator that achieves fast $O(\Gamma^{-1})$ convergence. The methodology is applied to big-data contexts, notably outlier detection, where M-I divergence—estimated via the ensemble method—outperforms classical divergences like KL and Renyi in distinguishing adjacent distributions. The work offers a scalable framework for distribution comparison in large-scale data analysis and highlights directions for parameter selection and broader applications. In short, M-I divergence provides a flexible, interpretable tool for distributional discrimination with provable estimation advantages in big data tasks.

Abstract

Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from ${O(\varGamma^{-j/d})}$ to ${O(\varGamma^{-1})}$ $({j\in (0,d]})$. We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.

Amplifying Inter-message Distance: On Information Divergence Measures in Big Data

TL;DR

The paper introduces Message Identification Divergence (M-I divergence), a parametric information-distance measure that amplifies the distance between similar distributions while preserving gaps between distinct ones. It provides fundamental properties (non-negativity, monotonicity in , convexity, and inequality relations with KL and Renyi) and develops practical estimation methods: a multidimensional discrete kernel estimator with a weight window and a weighted ensemble estimator that achieves fast convergence. The methodology is applied to big-data contexts, notably outlier detection, where M-I divergence—estimated via the ensemble method—outperforms classical divergences like KL and Renyi in distinguishing adjacent distributions. The work offers a scalable framework for distribution comparison in large-scale data analysis and highlights directions for parameter selection and broader applications. In short, M-I divergence provides a flexible, interpretable tool for distributional discrimination with provable estimation advantages in big data tasks.

Abstract

Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from to . We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.

Paper Structure

This paper contains 23 sections, 14 theorems, 96 equations, 2 figures, 2 algorithms.

Key Result

Proposition 1

The M-I divergence $D_{\varpi}(P\parallel Q)$ with $\varpi > 0$ is non-negative for any probability $P$ and $Q$, namely

Figures (2)

  • Figure 1: Performance of different divergences in the example with each sequence size $\varGamma_0=6000$, sequences number $T_0=200$, outlier sequences number $k_0=20$ and the number of experiments $N_{T_0}=100$.
  • Figure 2: Means and variances of AUC with respect to different sample size in the example with sequences number $T_0=200$, outlier sequences number $k_0=20$ and the number of experiments $N_{T_0}=100$.

Theorems & Definitions (32)

  • Definition 1
  • Proposition 1
  • proof : Proof:
  • Proposition 2
  • proof : Proof:
  • Remark 1
  • Proposition 3
  • proof : Proof:
  • Corollary 1
  • proof : Proof:
  • ...and 22 more