Table of Contents
Fetching ...

AIDetx: a compression-based method for identification of machine-learning generated text

Leonardo Almeida, Pedro Rodrigues, Diogo Magalhães, Armando J. Pinho, Diogo Pratas

TL;DR

AIDetx, a framework for detecting AI-generated text using finite-context models (FCMs) by classifying human and AI-generated text, showed strong performance, with notable computational efficiency, no need for GPUs, while offering enhanced interpretability compared to deep learning models.

Abstract

This paper introduces AIDetx, a novel method for detecting machine-generated text using data compression techniques. Traditional approaches, such as deep learning classifiers, often suffer from high computational costs and limited interpretability. To address these limitations, we propose a compression-based classification framework that leverages finite-context models (FCMs). AIDetx constructs distinct compression models for human-written and AI-generated text, classifying new inputs based on which model achieves a higher compression ratio. We evaluated AIDetx on two benchmark datasets, achieving F1 scores exceeding 97% and 99%, respectively, highlighting its high accuracy. Compared to current methods, such as large language models (LLMs), AIDetx offers a more interpretable and computationally efficient solution, significantly reducing both training time and hardware requirements (e.g., no GPUs needed). The full implementation is publicly available at https://github.com/AIDetx/AIDetx.

AIDetx: a compression-based method for identification of machine-learning generated text

TL;DR

AIDetx, a framework for detecting AI-generated text using finite-context models (FCMs) by classifying human and AI-generated text, showed strong performance, with notable computational efficiency, no need for GPUs, while offering enhanced interpretability compared to deep learning models.

Abstract

This paper introduces AIDetx, a novel method for detecting machine-generated text using data compression techniques. Traditional approaches, such as deep learning classifiers, often suffer from high computational costs and limited interpretability. To address these limitations, we propose a compression-based classification framework that leverages finite-context models (FCMs). AIDetx constructs distinct compression models for human-written and AI-generated text, classifying new inputs based on which model achieves a higher compression ratio. We evaluated AIDetx on two benchmark datasets, achieving F1 scores exceeding 97% and 99%, respectively, highlighting its high accuracy. Compared to current methods, such as large language models (LLMs), AIDetx offers a more interpretable and computationally efficient solution, significantly reducing both training time and hardware requirements (e.g., no GPUs needed). The full implementation is publicly available at https://github.com/AIDetx/AIDetx.

Paper Structure

This paper contains 14 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of the classifier based on finite-context models (FCMs).
  • Figure 2: F1 score for the grid search of the hyperparameters $k$ and $\alpha$ for the datasets HC3 (On the left) and AI-human-text (On the right).
  • Figure 3: Time performance for the grid search of the hyperparameters $k$ and $\alpha$ for the datasets HC3 (On the left) and AI-human-text (On the right).
  • Figure 4: Classifier performance evolution as reference text length increases.
  • Figure 5: Accuracy in function of the length of the target texts for the datasets HC3 (On the left) and AI-human-text (On the right).
  • ...and 1 more figures