Table of Contents
Fetching ...

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection

Guoxin Ma, Xiaoming Liu, Zhanhan Zhang, Chengzhengxu Li, Shengchao Liu, Yu Lan

TL;DR

The paper addresses generalization in machine-generated text detection under domain shift. It proposes DEER, a two-stage framework that disentangles domain-specific and domain-general signals with a Disentangled Mixture-of-Experts and uses an instance-adaptive reinforcement-learning routing mechanism for inference when domain labels are unavailable. Key contributions include explicit separation of domain-local vs cross-domain patterns, an RL-based policy for per-input expert selection, and extensive evaluations showing state-of-the-art performance across ten MAGE domains. This approach yields robust cross-domain detection and enables efficient incremental adaptation, offering practical deployability for monitoring open-world text generation.

Abstract

Detecting machine-generated text (MGT) has emerged as a critical challenge, driven by the rapid advancement of large language models (LLMs) capable of producing highly realistic, human-like content. However, the performance of current approaches often degrades significantly under domain shift. To address this challenge, we propose a novel framework designed to capture both domain-specific and domain-general MGT patterns through a two-stage Disentangled mixturE-of-ExpeRts (DEER) architecture. First, we introduce a disentangled mixture-of-experts module, in which domain-specific experts learn fine-grained, domain-local distinctions between human and machine-generated text, while shared experts extract transferable, cross-domain features. Second, to mitigate the practical limitation of unavailable domain labels during inference, we design a reinforcement learning-based routing mechanism that dynamically selects the appropriate experts for each input instance, effectively bridging the train-inference gap caused by domain uncertainty. Extensive experiments on five in-domain and five out-of-domain benchmark datasets demonstrate that DEER consistently outperforms state-of-the-art methods, achieving average F1-score improvements of 1.39% and 5.32% on in-domain and out-of-domain datasets respectively, along with accuracy gains of 1.35% and 3.61% respectively. Ablation studies confirm the critical contributions of both disentangled expert specialization and adaptive routing to model performance.

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection

TL;DR

The paper addresses generalization in machine-generated text detection under domain shift. It proposes DEER, a two-stage framework that disentangles domain-specific and domain-general signals with a Disentangled Mixture-of-Experts and uses an instance-adaptive reinforcement-learning routing mechanism for inference when domain labels are unavailable. Key contributions include explicit separation of domain-local vs cross-domain patterns, an RL-based policy for per-input expert selection, and extensive evaluations showing state-of-the-art performance across ten MAGE domains. This approach yields robust cross-domain detection and enables efficient incremental adaptation, offering practical deployability for monitoring open-world text generation.

Abstract

Detecting machine-generated text (MGT) has emerged as a critical challenge, driven by the rapid advancement of large language models (LLMs) capable of producing highly realistic, human-like content. However, the performance of current approaches often degrades significantly under domain shift. To address this challenge, we propose a novel framework designed to capture both domain-specific and domain-general MGT patterns through a two-stage Disentangled mixturE-of-ExpeRts (DEER) architecture. First, we introduce a disentangled mixture-of-experts module, in which domain-specific experts learn fine-grained, domain-local distinctions between human and machine-generated text, while shared experts extract transferable, cross-domain features. Second, to mitigate the practical limitation of unavailable domain labels during inference, we design a reinforcement learning-based routing mechanism that dynamically selects the appropriate experts for each input instance, effectively bridging the train-inference gap caused by domain uncertainty. Extensive experiments on five in-domain and five out-of-domain benchmark datasets demonstrate that DEER consistently outperforms state-of-the-art methods, achieving average F1-score improvements of 1.39% and 5.32% on in-domain and out-of-domain datasets respectively, along with accuracy gains of 1.35% and 3.61% respectively. Ablation studies confirm the critical contributions of both disentangled expert specialization and adaptive routing to model performance.

Paper Structure

This paper contains 21 sections, 7 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of DEER. The left panel shows the two-stage training process. In the upper part (Sec. \ref{['sec:DMoE']}), a disentangled Mixture-of-Experts model is trained with domain supervision, where domain-specific experts learn fine-grained domain-local patterns and domain-shared experts learn transferable, cross-domain signatures. In the lower part (Sec. \ref{['sec:RL_route']}), the trained DMoE are frozen and an RL-guided routing mechanism trains a policy network to predict a soft domain distribution for each input from task-driven rewards, enabling annotation-free expert selection. The right panel depicts the inference stage, where the top-$m$ domains with highest probabilities are selected, and their corresponding experts, along with the shared experts, are adaptively fused to produce the final prediction, ensuring robust and instance-aware generalization to unseen domains.
  • Figure 2: Radar plots of pre- and post-incremental performance. (a) Result across source domains before incremental adaptation. (b) Result after adapting to a new unseen domain, evaluated on both source and target domains.
  • Figure 3: Hyperparameter analysis on DG-MGT. Left: performance variation with different values of top-$m$ expert group selection. Right: F1-score heatmap under different configurations of shared and domain-specific experts.