Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Mazal Bethany; Brandon Wherry; Emet Bethany; Nishant Vishwamitra; Anthony Rios; Peyman Najafirad

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Mazal Bethany, Brandon Wherry, Emet Bethany, Nishant Vishwamitra, Anthony Rios, Peyman Najafirad

TL;DR

The paper tackles the challenge of detecting machine-generated text in real-world settings where a wide range of generators and domains are used. It proposes T5LLMCipher, a generalized detector that leverages a frozen LLM encoder to produce dense embeddings and trains lightweight classifiers (MLP, KNN, and a contrastive variant) to distinguish human vs. machine text and attribute generators. Across nine generators and nine domains, the approach achieves state-of-the-art generalization (e.g., 93.6% generator attribution accuracy) and robustness to adversarial perturbations, outperforming baselines by substantial margins. The findings underscore the value of leveraging LLM embedding spaces for generalized, interpretable detection while highlighting ethical considerations and avenues for further improvement such as adversarial training and broader attribution capabilities.

Abstract

With the recent proliferation of Large Language Models (LLMs), there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 5 figures, 9 tables)

This paper contains 27 sections, 7 equations, 5 figures, 9 tables.

Introduction
Background
Threat Model and Problem Scope
LLM Text Generation Objective
Threat Model
Data
Motivation
Approach
LLM Text Encoder
Human vs. LLM Classifiers
Multilayer Perceptron Classification
K-Nearest Neighbors Classification
Contrastive K-Nearest Neighbors
System Implementation and Evaluation
Implementation
...and 12 more sections

Figures (5)

Figure 1: t-SNE visualization of T5 embeddings illustrating the distribution of Human (blue), and Machine (red) generated text. While there is some degree of separation, considerable overlap is still evident.
Figure 2: t-SNE visualization of T5 embeddings illustrating the distribution of Human (blue), Bloomz (orange), ChatGPT (green), Cohere (teal), Davinci (purple), and Dolly (brown) generated texts across different domains. It is even more difficult to distinguish the source generator in the multiclass setting.
Figure 3: System Architecture of T5LLMCipher for distinguishing between human and machine-generated texts. The architecture consists of an LLM encoder, embedding databases to store the text embeddings extracted from the LLM encoder, and a classifier to map the text embedding to a classification decision.
Figure 4: t-SNE visualization of T5LLMCipher-MC classifier embeddings illustrating the distribution of Human (blue), and Machine (red) generated text. There exists a clear separation between Human and Machine generated text.
Figure 5: t-SNE visualization of T5LLMCipher-MC classifier embeddings illustrating the distribution of Human (blue), Bloomz (orange), ChatGPT (green), Cohere (teal), Davinci (purple), and Dolly (brown) generated texts across different domains. Our T5LLMCipher-MC classifier shows a strong ability to distinguish the source generator of text.

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

TL;DR

Abstract

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Authors

TL;DR

Abstract

Table of Contents

Figures (5)