Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Muhammad Tayyab Zamir; Muhammad Asif Ayub; Asma Gul; Nasir Ahmad; Kashif Ahmad

Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Muhammad Tayyab Zamir, Muhammad Asif Ayub, Asma Gul, Nasir Ahmad, Kashif Ahmad

TL;DR

This work tackles authorship and author-change detection in multi-authored texts by formulating three stylometry tasks as classification problems and introducing a merit-based late fusion framework that combines multiple transformer models with weight optimization. The fusion uses $F(0,1)=\sum_n w_n P_n(0,1)$ and minimizes $error = 1 - A_{acc}$ on a validation set, enabling robust cross-model decision fusion across tasks. A key finding is that retaining special characters (punctuation, contractions, stop/short words) improves discrimination, particularly for style-change detection, and PAN-21 benchmarks show consistent gains over existing approaches. Overall, the approach achieves up to about $0.85$ F1 on single-vs-multi-authored classification and around $0.55$ on multi-author-change detection, underscoring its practical potential for document provenance and authentication while informing preprocessing and fusion strategy choices.$

Abstract

In recent years, the increasing use of Artificial Intelligence based text generation tools has posed new challenges in document provenance, authentication, and authorship detection. However, advancements in stylometry have provided opportunities for automatic authorship and author change detection in multi-authored documents using style analysis techniques. Style analysis can serve as a primary step toward document provenance and authentication through authorship detection. This paper investigates three key tasks of style analysis: (i) classification of single and multi-authored documents, (ii) single change detection, which involves identifying the point where the author switches, and (iii) multiple author-switching detection in multi-authored documents. We formulate all three tasks as classification problems and propose a merit-based fusion framework that integrates several state-of-the-art natural language processing (NLP) algorithms and weight optimization techniques. We also explore the potential of special characters, which are typically removed during pre-processing in NLP applications, on the performance of the proposed methods for these tasks by conducting extensive experiments on both cleaned and raw datasets. Experimental results demonstrate significant improvements over existing solutions for all three tasks on a benchmark dataset.

Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

TL;DR

and minimizes

on a validation set, enabling robust cross-model decision fusion across tasks. A key finding is that retaining special characters (punctuation, contractions, stop/short words) improves discrimination, particularly for style-change detection, and PAN-21 benchmarks show consistent gains over existing approaches. Overall, the approach achieves up to about

F1 on single-vs-multi-authored classification and around

on multi-author-change detection, underscoring its practical potential for document provenance and authentication while informing preprocessing and fusion strategy choices.$

Abstract

Paper Structure (20 sections, 2 equations, 1 figure, 7 tables, 1 algorithm)

This paper contains 20 sections, 2 equations, 1 figure, 7 tables, 1 algorithm.

Introduction
Related Work
Single Vs. Multi-authored Document Classification
Author Change Detection
Tasks Description
Methodology
Pre-processing
Feature Extraction and Classification
Fusion
Dataset and Experimental Setup
Dataset
Experimental Setup
Experimental Results
Single vs. Multiple Authors Classification
Style Change Basic
...and 5 more sections

Figures (1)

Figure 4: The proposed methodology.

Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

TL;DR

Abstract

Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

TL;DR

Abstract

Table of Contents

Figures (1)