State Variation Mining: On Information Divergence with Message Importance in Big Data
Rui She, Shanyun Liu, Pingyi Fan
TL;DR
This work introduces the Message Importance Transfer Measure (MITM) to quantify information transfer in big data with a focus on rare events. MITM defines a transfer capacity $C = \sum_{\delta_0} p(\delta_0) \tilde{C}(\delta_0)$ with $\tilde{C}(\delta_0)= \max_{p(x)} \{ L(\tilde{Y}) - L(\tilde{Y}|X) \}$ under a Lipschitz constraint $|L(\tilde{Y})-L(\tilde{Y}|X)| \le \lambda \| p(\tilde{y})-p(\tilde{y}|x) \|_1$. The framework is extended to continuous distributions via $L(f(x))=\int f(x) e^{-f(x)} dx$ and $D_I(g||f)= \int [ g(x) e^{-g(x)} - f(x) e^{-f(x)} ] dx$, with a perturbation analysis showing $D_I(g_0||f_0) = O(\epsilon)$ for $g_0(x)= f_0(x) + \epsilon f_0^{\alpha}(x) u(x)$. Finally, the MITM is applied to Mobile Edge Computing using the $M/M/s/k$ queue to guide cache sizing, and simulations suggest MITM converges faster than KL divergence for state-variation assessment.
Abstract
Information transfer which reveals the state variation of variables usually plays a vital role in big data analytics and processing. In fact, the measures for information transfer could reflect the system change by use of the variable distributions, similar to KL divergence and Renyi divergence. Furthermore, in terms of the information transfer in big data, small probability events usually dominate the importance of the total message to some degree. Therefore, it is significant to design an information transfer measure based on the message importance which emphasizes the small probability events. In this paper, we propose a message importance transfer measure (MITM) and investigate its characteristics and applications on three aspects. First, the message importance transfer capacity based on MITM is presented to offer an upper bound for the information transfer process with disturbance. Then, we extend the MITM to the continuous case and discuss the robustness by using it to measuring information distance. Finally, we utilize the MITM to guide the queue length selection in the caching operation of mobile edge computing.
