Table of Contents
Fetching ...

Research on Violent Text Detection System Based on BERT-fasttext Model

Yongsheng Yang, Xiaoying Wang

TL;DR

The paper tackles violent text detection in online environments by proposing a BERT-fasttext fusion that combines BERT's contextual language understanding with FastText's efficient text classification. It introduces a keyword extraction component using a $\chi^{2}$-FPN algorithm, and a hybrid rule-language model that leverages $n$-gram context to constrain rules. Feature selection relies on multiple statistical criteria, including MI, IG, and $\chi^{2}$, to improve discriminative power. Experimental results on a hate speech dataset show that the BERT-fasttext model achieves top performance (e.g., Acc≈87.6%, F1≈86.6%), outperforming individual baselines and suggesting practical benefits for scalable content moderation and the development of domain-specific Chinese cyber-violence corpora.

Abstract

In the digital age of today, the internet has become an indispensable platform for people's lives, work, and information exchange. However, the problem of violent text proliferation in the network environment has arisen, which has brought about many negative effects. In view of this situation, it is particularly important to build an effective system for cutting off violent text. The study of violent text cutting off based on the BERT-fasttext model has significant meaning. BERT is a pre-trained language model with strong natural language understanding ability, which can deeply mine and analyze text semantic information; Fasttext itself is an efficient text classification tool with low complexity and good effect, which can quickly provide basic judgments for text processing. By combining the two and applying them to the system for cutting off violent text, on the one hand, it can accurately identify violent text, and on the other hand, it can efficiently and reasonably cut off the content, preventing harmful information from spreading freely on the network. Compared with the single BERT model and fasttext, the accuracy was improved by 0.7% and 0.8%, respectively. Through this model, it is helpful to purify the network environment, maintain the health of network information, and create a positive, civilized, and harmonious online communication space for netizens, driving the development of social networking, information dissemination, and other aspects in a more benign direction.

Research on Violent Text Detection System Based on BERT-fasttext Model

TL;DR

The paper tackles violent text detection in online environments by proposing a BERT-fasttext fusion that combines BERT's contextual language understanding with FastText's efficient text classification. It introduces a keyword extraction component using a -FPN algorithm, and a hybrid rule-language model that leverages -gram context to constrain rules. Feature selection relies on multiple statistical criteria, including MI, IG, and , to improve discriminative power. Experimental results on a hate speech dataset show that the BERT-fasttext model achieves top performance (e.g., Acc≈87.6%, F1≈86.6%), outperforming individual baselines and suggesting practical benefits for scalable content moderation and the development of domain-specific Chinese cyber-violence corpora.

Abstract

In the digital age of today, the internet has become an indispensable platform for people's lives, work, and information exchange. However, the problem of violent text proliferation in the network environment has arisen, which has brought about many negative effects. In view of this situation, it is particularly important to build an effective system for cutting off violent text. The study of violent text cutting off based on the BERT-fasttext model has significant meaning. BERT is a pre-trained language model with strong natural language understanding ability, which can deeply mine and analyze text semantic information; Fasttext itself is an efficient text classification tool with low complexity and good effect, which can quickly provide basic judgments for text processing. By combining the two and applying them to the system for cutting off violent text, on the one hand, it can accurately identify violent text, and on the other hand, it can efficiently and reasonably cut off the content, preventing harmful information from spreading freely on the network. Compared with the single BERT model and fasttext, the accuracy was improved by 0.7% and 0.8%, respectively. Through this model, it is helpful to purify the network environment, maintain the health of network information, and create a positive, civilized, and harmonious online communication space for netizens, driving the development of social networking, information dissemination, and other aspects in a more benign direction.

Paper Structure

This paper contains 14 sections, 20 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Violent Text Detection System
  • Figure 2: BERT input representation. The input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings.
  • Figure 3: The structure of fasttext model. The input will be added to hidden layer.And then they will be sent to label.
  • Figure 4: Fasttext text sentiment analysis process. The input texts will be processed first,select features then.The features will be sent to hidden layer.And eventually output.