Table of Contents
Fetching ...

Detection of Cyberbullying Incidents on the Instagram Social Network

Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, Shivakant Mishra

TL;DR

This work tackles cyberbullying detection on Instagram by differentiating it from cyberaggression and assembling a multi-modal dataset of images and comments labeled via crowdsourcing. It establishes a formal definition emphasizing online repetition and power imbalance, and analyzes labeled data to uncover correlations with textual and temporal features, as well as image content. A multi-modal detector combining text, image categories, and meta-data using dimensionality reduction and a linear SVM achieves up to 0.87 accuracy, demonstrating the value of fusing modalities beyond text alone. Key findings include that nearly half of highly negative sessions are not cyberbullying and that cyberaggression can occur without cyberbullying, underscoring the need for nuanced detectors with temporal and contextual cues to improve practical detection in social networks.

Abstract

Cyberbullying is a growing problem affecting more than half of all American teens. The main goal of this paper is to investigate fundamentally new approaches to understand and automatically detect incidents of cyberbullying over images in Instagram, a media-based mobile social network. To this end, we have collected a sample Instagram data set consisting of images and their associated comments, and designed a labeling study for cyberbullying as well as image content using human labelers at the crowd-sourced Crowdflower Web site. An analysis of the labeled data is then presented, including a study of correlations between different features and cyberbullying as well as cyberaggression. Using the labeled data, we further design and evaluate the accuracy of a classifier to automatically detect incidents of cyberbullying.

Detection of Cyberbullying Incidents on the Instagram Social Network

TL;DR

This work tackles cyberbullying detection on Instagram by differentiating it from cyberaggression and assembling a multi-modal dataset of images and comments labeled via crowdsourcing. It establishes a formal definition emphasizing online repetition and power imbalance, and analyzes labeled data to uncover correlations with textual and temporal features, as well as image content. A multi-modal detector combining text, image categories, and meta-data using dimensionality reduction and a linear SVM achieves up to 0.87 accuracy, demonstrating the value of fusing modalities beyond text alone. Key findings include that nearly half of highly negative sessions are not cyberbullying and that cyberaggression can occur without cyberbullying, underscoring the need for nuanced detectors with temporal and contextual cues to improve practical detection in social networks.

Abstract

Cyberbullying is a growing problem affecting more than half of all American teens. The main goal of this paper is to investigate fundamentally new approaches to understand and automatically detect incidents of cyberbullying over images in Instagram, a media-based mobile social network. To this end, we have collected a sample Instagram data set consisting of images and their associated comments, and designed a labeling study for cyberbullying as well as image content using human labelers at the crowd-sourced Crowdflower Web site. An analysis of the labeled data is then presented, including a study of correlations between different features and cyberbullying as well as cyberaggression. Using the labeled data, we further design and evaluate the accuracy of a classifier to automatically detect incidents of cyberbullying.

Paper Structure

This paper contains 8 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: An example of comments posted on Instagram. To give more room for the text, we have moved the associated image to overlay some of the text.
  • Figure 2: Comparison of the distribution of the number of comments per collected Instagram media session. Blue is for the complete set of media sessions, and red is for the selected subset of 998 media sessions with more than 15 comments and high degree of negativity.
  • Figure 3: CCDF of the number of followed by and follows for users in the complete set and highly negative subset of media sessions.
  • Figure 4: An example of the labeling survey, which shows an image and its corresponding comments, and the survey questions.
  • Figure 5: Fraction of media sessions that have been voted $k$ times as cyberagression (left) or cyberbullying (right).
  • ...and 5 more figures