Table of Contents
Fetching ...

Interactive Event Sifting using Bayesian Graph Neural Networks

José Nascimento, Nathan Jacobs, Anderson Rocha

TL;DR

This work introduces an interactive process for training an event-centric, learningbased multimodal classification model that automates sanitization and proposes a method based on Bayesian Graph Neural Networks and evaluates active learning and pseudolabeling formulations to reduce the number of posts the analyst must manually annotate.

Abstract

Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.

Interactive Event Sifting using Bayesian Graph Neural Networks

TL;DR

This work introduces an interactive process for training an event-centric, learningbased multimodal classification model that automates sanitization and proposes a method based on Bayesian Graph Neural Networks and evaluates active learning and pseudolabeling formulations to reduce the number of posts the analyst must manually annotate.

Abstract

Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.
Paper Structure (16 sections, 2 equations, 2 figures, 3 tables)

This paper contains 16 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Proposed Method. In (1), we combine an event dataset of interest with the augmentation dataset, which combines data from other events. All augmentation dataset instances are considered negative samples during training. In (2), we extract features from the image and text attached to the post using a CLIP model and concatenate them to form a post representation, and in (3), we use k-NN to support the graph creation. In (4), we run KMeans and select the point closer to each centroid to be annotated by the user (the analyst in a hypothetical event analysis scenario). With some points annotated in (5), we run the Bayesian Graph Neural Network in (6), obtaining suitable predictions for the next step (7), where BALD-KMeans measures the most certain and least certain points in each cluster. Finally, in (8), the analyst provides the labels for the points with the most uncertainty. In addition, the method applies pseudo-labeling in certain cases, considering the model prediction as ground truth. Finally, the method goes back to step (6) for another iteration.
  • Figure 2: t-SNE Visualization. 2D visualization for the Mexico Earthquake dataset. The data distribution of the augmentation dataset is closer to the sparse non-informative set, which we infer to be the reason for the improvement caused by its addition to the graph.