Table of Contents
Fetching ...

Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans

Vasilii A. Gromov, Alexandra S. Kogan

TL;DR

This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts, and compares the clusterizations of datasets of n-grams from literary texts and texts generated by several bots.

Abstract

Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations of datasets of n-grams from literary texts and texts generated by several bots. The hypothesis is that the structures and clusterizations are different. Our research supports the hypothesis. As the semantic structure may be different for different languages, we investigate Russian, English, German, and Vietnamese languages.

Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans

TL;DR

This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts, and compares the clusterizations of datasets of n-grams from literary texts and texts generated by several bots.

Abstract

Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations of datasets of n-grams from literary texts and texts generated by several bots. The hypothesis is that the structures and clusterizations are different. Our research supports the hypothesis. As the semantic structure may be different for different languages, we investigate Russian, English, German, and Vietnamese languages.
Paper Structure (13 sections, 2 figures, 4 tables)

This paper contains 13 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: t-SNE visualization of clusters. Language: Russian. Embedding: SVD
  • Figure 2: t-SNE visualization of clusters. Language: Russian. Embedding: CBOW