Table of Contents
Fetching ...

Simplex-Optimized Hybrid Ensemble for Large Language Model Text Detection Under Generative Distribution Drif

Sepyan Purnama Kristanto, Lutfi Hakim, Dianni Yusuf

TL;DR

The paper addresses the instability of LLM-generated text detectors under generative distribution drift by introducing a simplex-constrained hybrid ensemble that combines a RoBERTa-based semantic detector, a curvature-based likelihood perturbation score, and a stylometric classifier. The authors formalize risk under generator mixtures and justify convex simplex fusion to reduce worst-case error while remaining lightweight to deploy. Empirically, on GenDrift-30K, the ensemble achieves 94.2% accuracy and AUC 0.978, with notably lower false positives on academic text and strong cross-generator generalization, including paraphrase attacks. The work demonstrates the practical viability of interpretable, modular ensembles for robust AI-text detection in educational and research contexts, and outlines future directions in distillation, dynamic fusion, and multilingual evaluation.

Abstract

The widespread adoption of large language models (LLMs) has made it difficult to distinguish human writing from machine-produced text in many real applications. Detectors that were effective for one generation of models tend to degrade when newer models or modified decoding strategies are introduced. In this work, we study this lack of stability and propose a hybrid ensemble that is explicitly designed to cope with changing generator distributions. The ensemble combines three complementary components: a RoBERTa-based classifier fine-tuned for supervised detection, a curvature-inspired score based on perturbing the input and measuring changes in model likelihood, and a compact stylometric model built on hand-crafted linguistic features. The outputs of these components are fused on the probability simplex, and the weights are chosen via validation-based search. We frame this approach in terms of variance reduction and risk under mixtures of generators, and show that the simplex constraint provides a simple way to trade off the strengths and weaknesses of each branch. Experiments on a 30000 document corpus drawn from several LLM families including models unseen during training and paraphrased attack variants show that the proposed method achieves 94.2% accuracy and an AUC of 0.978. The ensemble also lowers false positives on scientific articles compared to strong baselines, which is critical in educational and research settings where wrongly flagging human work is costly

Simplex-Optimized Hybrid Ensemble for Large Language Model Text Detection Under Generative Distribution Drif

TL;DR

The paper addresses the instability of LLM-generated text detectors under generative distribution drift by introducing a simplex-constrained hybrid ensemble that combines a RoBERTa-based semantic detector, a curvature-based likelihood perturbation score, and a stylometric classifier. The authors formalize risk under generator mixtures and justify convex simplex fusion to reduce worst-case error while remaining lightweight to deploy. Empirically, on GenDrift-30K, the ensemble achieves 94.2% accuracy and AUC 0.978, with notably lower false positives on academic text and strong cross-generator generalization, including paraphrase attacks. The work demonstrates the practical viability of interpretable, modular ensembles for robust AI-text detection in educational and research contexts, and outlines future directions in distillation, dynamic fusion, and multilingual evaluation.

Abstract

The widespread adoption of large language models (LLMs) has made it difficult to distinguish human writing from machine-produced text in many real applications. Detectors that were effective for one generation of models tend to degrade when newer models or modified decoding strategies are introduced. In this work, we study this lack of stability and propose a hybrid ensemble that is explicitly designed to cope with changing generator distributions. The ensemble combines three complementary components: a RoBERTa-based classifier fine-tuned for supervised detection, a curvature-inspired score based on perturbing the input and measuring changes in model likelihood, and a compact stylometric model built on hand-crafted linguistic features. The outputs of these components are fused on the probability simplex, and the weights are chosen via validation-based search. We frame this approach in terms of variance reduction and risk under mixtures of generators, and show that the simplex constraint provides a simple way to trade off the strengths and weaknesses of each branch. Experiments on a 30000 document corpus drawn from several LLM families including models unseen during training and paraphrased attack variants show that the proposed method achieves 94.2% accuracy and an AUC of 0.978. The ensemble also lowers false positives on scientific articles compared to strong baselines, which is critical in educational and research settings where wrongly flagging human work is costly

Paper Structure

This paper contains 34 sections, 2 theorems, 16 equations, 2 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

If the component detectors $f_m$ are not perfectly correlated and rely on different types of information (for instance, semantic, probabilistic, and stylometric cues), then there exist weights $\mathbf{w} \in \Delta_M$ such that $\mathrm{Var}(\bar{f})$ is strictly smaller than the variance of each c

Figures (2)

  • Figure 1: Overview of the proposed Simplex-Optimized Hybrid Ensemble. The document is processed by three detectors (semantic, curvature-based, and stylometric). Their outputs are combined using weights constrained to the probability simplex and tuned on a validation set.
  • Figure 2: (a) ROC curves for single detectors and the ensemble. The ensemble dominates across thresholds. (b) Reliability diagram on the test set, showing that the ensemble produces better-calibrated probabilities than the RoBERTa-only detector.

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2