Detection of Cyberbullying Incidents on the Instagram Social Network
Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, Shivakant Mishra
TL;DR
This work tackles cyberbullying detection on Instagram by differentiating it from cyberaggression and assembling a multi-modal dataset of images and comments labeled via crowdsourcing. It establishes a formal definition emphasizing online repetition and power imbalance, and analyzes labeled data to uncover correlations with textual and temporal features, as well as image content. A multi-modal detector combining text, image categories, and meta-data using dimensionality reduction and a linear SVM achieves up to 0.87 accuracy, demonstrating the value of fusing modalities beyond text alone. Key findings include that nearly half of highly negative sessions are not cyberbullying and that cyberaggression can occur without cyberbullying, underscoring the need for nuanced detectors with temporal and contextual cues to improve practical detection in social networks.
Abstract
Cyberbullying is a growing problem affecting more than half of all American teens. The main goal of this paper is to investigate fundamentally new approaches to understand and automatically detect incidents of cyberbullying over images in Instagram, a media-based mobile social network. To this end, we have collected a sample Instagram data set consisting of images and their associated comments, and designed a labeling study for cyberbullying as well as image content using human labelers at the crowd-sourced Crowdflower Web site. An analysis of the labeled data is then presented, including a study of correlations between different features and cyberbullying as well as cyberaggression. Using the labeled data, we further design and evaluate the accuracy of a classifier to automatically detect incidents of cyberbullying.
