Table of Contents
Fetching ...

Sequential Classification of Misinformation

Daniel Toma, Wasim Huleihel

TL;DR

This paper proposes a probabilistic information flow model over graphs, and proposes two detection algorithms, one based on the well-known multiple sequential probability ratio test, while the second is a novel graph neural network based sequential decision algorithm.

Abstract

In recent years there have been a growing interest in online auditing of information flow over social networks with the goal of monitoring undesirable effects, such as, misinformation and fake news. Most previous work on the subject, focus on the binary classification problem of classifying information as fake or genuine. Nonetheless, in many practical scenarios, the multi-class/label setting is of particular importance. For example, it could be the case that a social media platform may want to distinguish between ``true", ``partly-true", and ``false" information. Accordingly, in this paper, we consider the problem of online multiclass classification of information flow. To that end, driven by empirical studies on information flow over real-world social media networks, we propose a probabilistic information flow model over graphs. Then, the learning task is to detect the label of the information flow, with the goal of minimizing a combination of the classification error and the detection time. For this problem, we propose two detection algorithms; the first is based on the well-known multiple sequential probability ratio test, while the second is a novel graph neural network based sequential decision algorithm. For both algorithms, we prove several strong statistical guarantees. We also construct a data driven algorithm for learning the proposed probabilistic model. Finally, we test our algorithms over two real-world datasets, and show that they outperform other state-of-the-art misinformation detection algorithms, in terms of detection time and classification error.

Sequential Classification of Misinformation

TL;DR

This paper proposes a probabilistic information flow model over graphs, and proposes two detection algorithms, one based on the well-known multiple sequential probability ratio test, while the second is a novel graph neural network based sequential decision algorithm.

Abstract

In recent years there have been a growing interest in online auditing of information flow over social networks with the goal of monitoring undesirable effects, such as, misinformation and fake news. Most previous work on the subject, focus on the binary classification problem of classifying information as fake or genuine. Nonetheless, in many practical scenarios, the multi-class/label setting is of particular importance. For example, it could be the case that a social media platform may want to distinguish between ``true", ``partly-true", and ``false" information. Accordingly, in this paper, we consider the problem of online multiclass classification of information flow. To that end, driven by empirical studies on information flow over real-world social media networks, we propose a probabilistic information flow model over graphs. Then, the learning task is to detect the label of the information flow, with the goal of minimizing a combination of the classification error and the detection time. For this problem, we propose two detection algorithms; the first is based on the well-known multiple sequential probability ratio test, while the second is a novel graph neural network based sequential decision algorithm. For both algorithms, we prove several strong statistical guarantees. We also construct a data driven algorithm for learning the proposed probabilistic model. Finally, we test our algorithms over two real-world datasets, and show that they outperform other state-of-the-art misinformation detection algorithms, in terms of detection time and classification error.
Paper Structure (38 sections, 12 theorems, 114 equations, 6 figures, 2 tables, 4 algorithms)

This paper contains 38 sections, 12 theorems, 114 equations, 6 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

Fix $k\in[M]$, and assume that Then, for any $t\in\mathbb{R}_+$, we have, for some $\mathsf{C}_1,\mathsf{C}_2\in\mathbb{R}_+$.

Figures (6)

  • Figure 1: An illustration of the input-output information flow detection problem.
  • Figure 2: Distribution of edge types across different hypotheses.
  • Figure 3: A partial social media graph with a single information source at $s = 1$. Each weighted path in the graph corresponds to a different Markov chain.
  • Figure 4: The msprtGNN architecture. Dense operates locally on node features. GINConv embeds each node with its ancestor. The add pooling layer aggregates the embeddings of all nodes to a graph level vector.
  • Figure 5: Accuracy as a function of time step.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Theorem 1: Exponentially bounded stopping time
  • Theorem 2: Error guarantees
  • Theorem 3: Asymptotic stopping time
  • Theorem 4: Exponentially bounded stopping time
  • Theorem 5: Error guarantees
  • Theorem 6: Asymptotic stopping time
  • Lemma 1: AEP for Markov edges
  • proof : Proof of Lemma \ref{['lem:AEP']}
  • Lemma 2
  • proof : Proof of Lemma \ref{['lem:lemma_time']}
  • ...and 10 more