Table of Contents
Fetching ...

Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy

Shuhai Zhang, Yiliao Song, Jiahao Yang, Yuanqing Li, Bo Han, Mingkui Tan

TL;DR

This work tackles the challenge of detecting machine-generated texts by exploiting maximum mean discrepancy (MMD) while addressing high variance caused by training on multiple text populations from different LLMs. It introduces MMD-MP, a multi-population aware optimization that replaces the intra-population aggregation in the MMD objective with a proxy (MPP), yielding much more stable discrepancy estimates. Leveraging the trained deep kernel, the authors develop paragraph-based (2ST) and single-instance detection approaches, showing superior performance and transferability to unknown LLMs on HC3 and XSum datasets. The approach delivers stronger detection power and AUROC than strong baselines, with robust performance in unbalanced and unknown-population scenarios, suggesting practical viability for robust MGT detection in diverse real-world settings.

Abstract

Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleading information, or hallucination issues. Therefore, it is very urgent and important to detect MGTs in many situations. Unfortunately, it is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle due to the remarkable performance of LLMs. In this paper, we seek to exploit \textit{maximum mean discrepancy} (MMD) to address this issue in the sense that MMD can well identify distributional discrepancies. However, directly training a detector with MMD using diverse MGTs will incur a significantly increased variance of MMD since MGTs may contain \textit{multiple text populations} due to various LLMs. This will severely impair MMD's ability to measure the difference between two samples. To tackle this, we propose a novel \textit{multi-population} aware optimization method for MMD called MMD-MP, which can \textit{avoid variance increases} and thus improve the stability to measure the distributional discrepancy. Relying on MMD-MP, we develop two methods for paragraph-based and sentence-based detection, respectively. Extensive experiments on various LLMs, \eg, GPT2 and ChatGPT, show superior detection performance of our MMD-MP. The source code is available at \url{https://github.com/ZSHsh98/MMD-MP}.

Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy

TL;DR

This work tackles the challenge of detecting machine-generated texts by exploiting maximum mean discrepancy (MMD) while addressing high variance caused by training on multiple text populations from different LLMs. It introduces MMD-MP, a multi-population aware optimization that replaces the intra-population aggregation in the MMD objective with a proxy (MPP), yielding much more stable discrepancy estimates. Leveraging the trained deep kernel, the authors develop paragraph-based (2ST) and single-instance detection approaches, showing superior performance and transferability to unknown LLMs on HC3 and XSum datasets. The approach delivers stronger detection power and AUROC than strong baselines, with robust performance in unbalanced and unknown-population scenarios, suggesting practical viability for robust MGT detection in diverse real-world settings.

Abstract

Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleading information, or hallucination issues. Therefore, it is very urgent and important to detect MGTs in many situations. Unfortunately, it is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle due to the remarkable performance of LLMs. In this paper, we seek to exploit \textit{maximum mean discrepancy} (MMD) to address this issue in the sense that MMD can well identify distributional discrepancies. However, directly training a detector with MMD using diverse MGTs will incur a significantly increased variance of MMD since MGTs may contain \textit{multiple text populations} due to various LLMs. This will severely impair MMD's ability to measure the difference between two samples. To tackle this, we propose a novel \textit{multi-population} aware optimization method for MMD called MMD-MP, which can \textit{avoid variance increases} and thus improve the stability to measure the distributional discrepancy. Relying on MMD-MP, we develop two methods for paragraph-based and sentence-based detection, respectively. Extensive experiments on various LLMs, \eg, GPT2 and ChatGPT, show superior detection performance of our MMD-MP. The source code is available at \url{https://github.com/ZSHsh98/MMD-MP}.
Paper Structure (42 sections, 8 theorems, 58 equations, 17 figures, 17 tables, 3 algorithms)

This paper contains 42 sections, 8 theorems, 58 equations, 17 figures, 17 tables, 3 algorithms.

Key Result

Proposition 1

(Asymptotics of $\widehat{\mathrm{MPP}}_{u}$) Under the alternative $\mathfrak{H}_1: {\mathbb P} \neq {\mathbb Q}$, based on a standard central limit theorem, we have: where $\sigma_{\mathfrak{H}_1^*}^2:=4\left(\mathbb{E}[H^*_{12}H^*_{13}]-\mathbb{E}[H^*_{12}]^2\right)$, $H^*_{12}$, $H^*_{13}$ denote different $H^*_{ij}$.

Figures (17)

  • Figure 1: Illustration of MMD values, MMD variances, and the test power of MMD-D and our MMD-MP during the optimization process. As the number of $S_{\mathbb Q}^{tr}$ populations (i.e., $q$) increases, MMD-D shows an increase in MMD, accompanied by a sharp rise in variance, resulting in unstable test power during testing. In contrast, our MMD-MP exhibits minimal variance in MMD values, leading to higher and more stable test power during testing.
  • Figure 2: ${\mathbb E}(k)$ in MMD and their variances under two optimization methods (MMD-MP is ours). Subfigures (a) and (b) depict the value of each ${\mathbb E}(k)$ in MMD during training by MMD-D and MMD-MP with $q{=}1$ and $q{=}3$, respectively. Subfigures (c) and (d) illustrate the variances of some terms associated with MMD, i.e., $\sigma^2_{\mathfrak{H}_1}$ when training by MMD-D and MMD-MP, respectively.
  • Figure 3: AUROC$/100$ on HC3 given $3, 100$ processed paragraphs.
  • Figure 4: Impact of variance in training data on test power.
  • Figure 5: Test power and AUROC on HC3 given $2,000$ HWT and $400$ MGT training paragraphs.
  • ...and 12 more figures

Theorems & Definitions (17)

  • Definition 1
  • Proposition 1
  • Corollary 1
  • Theorem 1
  • proof
  • proof
  • Proposition 2
  • proof
  • Lemma 1
  • proof
  • ...and 7 more