Table of Contents
Fetching ...

DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

L. D. M. S. Sai Teja, N. Siva Gopala Krishna, Ufaq Khan, Muhammad Haris Khan, Atul Mishra

TL;DR

The paper tackles the problem of segmenting mixed human–AI text under adversarial perturbations by introducing Info-Mask, a soft attribution masking framework that leverages stylometric and linguistic cues to guide boundary detection via a CRF-based sequence labeling system. It couples this with Human-Interpretable Attribution overlays to aid human oversight and introduces MAS, a large adversarially perturbed mixed-authorship benchmark with SBDA and SegPre metrics for fine-grained span evaluation. Empirical results show that Info-Mask-based models achieve state-of-the-art robustness and interpretability, outperforming strong baselines under syntactic attacks, while ablations and human studies validate the approach and highlight remaining challenges. The work advances trustworthy mixed-authorship detection with practical implications for authenticity and oversight in AI-assisted writing environments.

Abstract

In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-authorship text, that is identifying transition points in text where authorship shifts from human to AI or vice-versa, a problem with critical implications for authenticity, trust, and human oversight. We introduce a novel framework, called Info-Mask for mixed authorship detection that integrates stylometric cues, perplexity-driven signals, and structured boundary modeling to accurately segment collaborative human-AI content. To evaluate the robustness of our system against adversarial perturbations, we construct and release an adversarial benchmark dataset Mixed-text Adversarial setting for Segmentation (MAS), designed to probe the limits of existing detectors. Beyond segmentation accuracy, we introduce Human-Interpretable Attribution (HIA overlays that highlight how stylometric features inform boundary predictions, and we conduct a small-scale human study assessing their usefulness. Across multiple architectures, Info-Mask significantly improves span-level robustness under adversarial conditions, establishing new baselines while revealing remaining challenges. Our findings highlight both the promise and limitations of adversarially robust, interpretable mixed-authorship detection, with implications for trust and oversight in human-AI co-authorship.

DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

TL;DR

The paper tackles the problem of segmenting mixed human–AI text under adversarial perturbations by introducing Info-Mask, a soft attribution masking framework that leverages stylometric and linguistic cues to guide boundary detection via a CRF-based sequence labeling system. It couples this with Human-Interpretable Attribution overlays to aid human oversight and introduces MAS, a large adversarially perturbed mixed-authorship benchmark with SBDA and SegPre metrics for fine-grained span evaluation. Empirical results show that Info-Mask-based models achieve state-of-the-art robustness and interpretability, outperforming strong baselines under syntactic attacks, while ablations and human studies validate the approach and highlight remaining challenges. The work advances trustworthy mixed-authorship detection with practical implications for authenticity and oversight in AI-assisted writing environments.

Abstract

In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-authorship text, that is identifying transition points in text where authorship shifts from human to AI or vice-versa, a problem with critical implications for authenticity, trust, and human oversight. We introduce a novel framework, called Info-Mask for mixed authorship detection that integrates stylometric cues, perplexity-driven signals, and structured boundary modeling to accurately segment collaborative human-AI content. To evaluate the robustness of our system against adversarial perturbations, we construct and release an adversarial benchmark dataset Mixed-text Adversarial setting for Segmentation (MAS), designed to probe the limits of existing detectors. Beyond segmentation accuracy, we introduce Human-Interpretable Attribution (HIA overlays that highlight how stylometric features inform boundary predictions, and we conduct a small-scale human study assessing their usefulness. Across multiple architectures, Info-Mask significantly improves span-level robustness under adversarial conditions, establishing new baselines while revealing remaining challenges. Our findings highlight both the promise and limitations of adversarially robust, interpretable mixed-authorship detection, with implications for trust and oversight in human-AI co-authorship.

Paper Structure

This paper contains 31 sections, 5 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Model workflow showing the construction and integration of the Info-Mask to guide span segmentation using stylometric and contextual signals.
  • Figure 2: Cumulative Distribution Function (CDF) of IoU scores for all models.
  • Figure 3: Violin plots of paired IoU score differences, showing RMC’s consistent performance superiority over other models.
  • Figure 4: Heatmap with Confidence Interval(CI) of SBDA@0.3 scores across various adversarial attacks. Brighter yellow indicates higher scores, showcasing the superior and consistent robustness of our RMC model.
  • Figure 5: IoU Distribution Comparisons with RMC* with all other models.
  • ...and 5 more figures