Table of Contents
Fetching ...

SEFD: Semantic-Enhanced Framework for Detecting LLM-Generated Text

Weiqing He, Bojian Hou, Tianqi Shang, Davoud Ataee Tarzanagh, Qi Long, Li Shen

TL;DR

This work presents a novel semantic-enhanced framework for detecting LLM-generated text (SEFD) that leverages a retrieval-based mechanism to fully utilize text semantics and demonstrates that the framework substantially enhances detection accuracy in paraphrasing scenarios while maintaining robustness for standard LLM-generated content.

Abstract

The widespread adoption of large language models (LLMs) has created an urgent need for robust tools to detect LLM-generated text, especially in light of \textit{paraphrasing} techniques that often evade existing detection methods. To address this challenge, we present a novel semantic-enhanced framework for detecting LLM-generated text (SEFD) that leverages a retrieval-based mechanism to fully utilize text semantics. Our framework improves upon existing detection methods by systematically integrating retrieval-based techniques with traditional detectors, employing a carefully curated retrieval mechanism that strikes a balance between comprehensive coverage and computational efficiency. We showcase the effectiveness of our approach in sequential text scenarios common in real-world applications, such as online forums and Q\&A platforms. Through comprehensive experiments across various LLM-generated texts and detection methods, we demonstrate that our framework substantially enhances detection accuracy in paraphrasing scenarios while maintaining robustness for standard LLM-generated content.

SEFD: Semantic-Enhanced Framework for Detecting LLM-Generated Text

TL;DR

This work presents a novel semantic-enhanced framework for detecting LLM-generated text (SEFD) that leverages a retrieval-based mechanism to fully utilize text semantics and demonstrates that the framework substantially enhances detection accuracy in paraphrasing scenarios while maintaining robustness for standard LLM-generated content.

Abstract

The widespread adoption of large language models (LLMs) has created an urgent need for robust tools to detect LLM-generated text, especially in light of \textit{paraphrasing} techniques that often evade existing detection methods. To address this challenge, we present a novel semantic-enhanced framework for detecting LLM-generated text (SEFD) that leverages a retrieval-based mechanism to fully utilize text semantics. Our framework improves upon existing detection methods by systematically integrating retrieval-based techniques with traditional detectors, employing a carefully curated retrieval mechanism that strikes a balance between comprehensive coverage and computational efficiency. We showcase the effectiveness of our approach in sequential text scenarios common in real-world applications, such as online forums and Q\&A platforms. Through comprehensive experiments across various LLM-generated texts and detection methods, we demonstrate that our framework substantially enhances detection accuracy in paraphrasing scenarios while maintaining robustness for standard LLM-generated content.

Paper Structure

This paper contains 17 sections, 5 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: A brief version of SEFD structure. SEFD comprises a retrieval database/pool (colored by purple) and three detection steps (colored by green).
  • Figure 2: The detailed structure of our framework. The input sequence on the left consists of three texts: the first is generated by an LLM, the second is human-written, and the third is a paraphrased version of the first text by another LLM. These texts are processed in order. For text $x_i$, we conduct initial detection and semantic similarity computation simultaneously to get detection score and similarity score. These scores are then combined using the fusion function to produce the semantic-enhanced detection score, which classifies $x_i$. Finally, based on an updating rule, we decide whether to include $x_i$ in the retrieval pool and proceed to detect the next text, $x_{i+1}$.
  • Figure 3: Detection score distributions for four different detectors: Log-Likelihood solaiman2019release, DetectGPT mitchell2023detectgpt, Intrinsic Dimension tulchinskii2024intrinsic, and Soft Watermarking kirchenbauer2023watermark. The text data consists of answers to questions from the r/explainlikeimfive subreddit. There are three groups of answers: Human for human-written answers, Model for answers generated by the GPT-2 XL model radford2019language, and Paraphrased for paraphrased version of GPT-2 XL generated answers using DIPPER krishna2023paraphrasing.
  • Figure 4: Similarity score distribution on four different LLM generated datasets: GPT-2 XL model, OPT-13B model, GPT-3.5 model, and GPT-4o-mini model. For each dataset, the answers come from three sources: Human for human-written answers, Model for answers generated by LLM, and Paraphrased for the paraphrased version of LLM-generated answers using DIPPER. The scores for LLM-generated answers are concentrated around 1, and the scores for paraphrased answers are obviously higher than those for human-written text.
  • Figure 5: Recursive paraphrasing
  • ...and 2 more figures