Table of Contents
Fetching ...

Conversation-Based Multimodal Abuse Detection Through Text and Graph Embeddings

Noé Cecillon, Vincent Labatut, Richard Dufour

TL;DR

This work tackles abuse detection in online conversations by integrating textual content with conversational structure through representation learning. It introduces two novel whole-graph embedding methods (WDA-SG2V and WDA-WSGCN) that incorporate edge weights, directions, signs, and vertex attributes, and systematically compares them with a broad suite of text and graph embeddings. Fusion experiments demonstrate that combining text and graph modalities yields the best performance, achieving up to $F$-measure = 87.06, highlighting the complementarity of content and context signals. The study also analyzes which discriminative features are captured by embeddings, providing interpretability insights and showing directions for future multimodal and dynamic-graph extensions with practical impact for scalable abuse detection.

Abstract

Abusive behavior is common on online social networks, and forces the hosts of such platforms to find new solutions to address this problem. Various methods have been proposed to automate this task in the past decade. Most of them rely on the exchanged content, but ignore the structure and dynamics of the conversation, which could provide some relevant information. In this article, we propose to use representation learning methods to automatically produce embeddings of this textual content and of the conversational graphs depicting message exchanges. While the latter could be enhanced by including additional information on top of the raw conversational structure, no method currently exists to learn whole-graph representations using simultaneously edge directions, weights, signs, and vertex attributes. We propose two such methods to fill this gap in the literature. We experiment with 5 textual and 13 graph embedding methods, and apply them to a dataset of online messages annotated for abuse detection. Our best results achieve an F -measure of 81.02 using text alone and 80.61 using graphs alone. We also combine both modalities of information (text and graphs) through three fusion strategies, and show that this strongly improves abuse detection performance, increasing the F -measure to 87.06. Finally, we identify which specific engineered features are captured by the embedding methods under consideration. These features have clear interpretations and help explain what information the representation learning methods deem discriminative.

Conversation-Based Multimodal Abuse Detection Through Text and Graph Embeddings

TL;DR

This work tackles abuse detection in online conversations by integrating textual content with conversational structure through representation learning. It introduces two novel whole-graph embedding methods (WDA-SG2V and WDA-WSGCN) that incorporate edge weights, directions, signs, and vertex attributes, and systematically compares them with a broad suite of text and graph embeddings. Fusion experiments demonstrate that combining text and graph modalities yields the best performance, achieving up to -measure = 87.06, highlighting the complementarity of content and context signals. The study also analyzes which discriminative features are captured by embeddings, providing interpretability insights and showing directions for future multimodal and dynamic-graph extensions with practical impact for scalable abuse detection.

Abstract

Abusive behavior is common on online social networks, and forces the hosts of such platforms to find new solutions to address this problem. Various methods have been proposed to automate this task in the past decade. Most of them rely on the exchanged content, but ignore the structure and dynamics of the conversation, which could provide some relevant information. In this article, we propose to use representation learning methods to automatically produce embeddings of this textual content and of the conversational graphs depicting message exchanges. While the latter could be enhanced by including additional information on top of the raw conversational structure, no method currently exists to learn whole-graph representations using simultaneously edge directions, weights, signs, and vertex attributes. We propose two such methods to fill this gap in the literature. We experiment with 5 textual and 13 graph embedding methods, and apply them to a dataset of online messages annotated for abuse detection. Our best results achieve an F -measure of 81.02 using text alone and 80.61 using graphs alone. We also combine both modalities of information (text and graphs) through three fusion strategies, and show that this strongly improves abuse detection performance, increasing the F -measure to 87.06. Finally, we identify which specific engineered features are captured by the embedding methods under consideration. These features have clear interpretations and help explain what information the representation learning methods deem discriminative.

Paper Structure

This paper contains 27 sections, 2 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Examples of the three vertex attributes: (a) author; (b) distance; and (c) target. The targeted vertex is represented in red.
  • Figure 2: Processing steps performed for each representation method considered in the experiments. The top part focuses on text-based methods, which use only the classified message as input, whereas the bottom part focuses on graph-based methods, which use the whole conversation, and include an additional graph extraction step. In both cases, the baseline relies on feature engineering, whereas all the other methods considered here use representation learning. The SVM-based classification phase is the same for text- and graph-based methods.
  • Figure 3: Illustration of the three fusion strategies used to combine pairs of representations. On the left, the red and green blocks correspond to two representation methods of interest, each one outputting some vector representation fetched to an SVM classifier, similarly to what is shown in Figure \ref{['fig:MainPipeline']}. The new part is the Fusion phase, displayed on the right, which involves three SVM using different inputs (see text).
  • Figure 4: Text measures captured (green), partially captured (orange), or not captured (red) by the word embedding approaches. Each value is the difference between the $F$-measure score obtained by the embedding method on its own, and the score obtained by the embedding method complemented by the corresponding Best Feature.
  • Figure 5: Topological measures captured (green), partially captured (orange), or not captured (red) by the embedding approaches. The first 4 topological measures are computed at the Graph level and the last 5 topological measures are computed at the Vertex level. Each value is the difference between the $F$-measure score obtained by the embedding method on its own and the score obtained by the embedding method complemented by the corresponding Best Feature.