TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

Adaku Uchendu; Thai Le; Dongwon Lee

TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

Adaku Uchendu, Thai Le, Dongwon Lee

TL;DR

This work investigates the more general version of the problem of Authorship Attribution in a multi-class setting, and proposes TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the Transformer-based model.

Abstract

Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as deepfake texts. There are currently over 72K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and dis/misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as Authorship Attribution (AA), in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the Transformer-based model. We show the benefits of having a TDA layer when dealing with imbalanced, and multi-style datasets, by extracting TDA features from the reshaped $pooled\_output$ of our backbone as input. This Transformer-based model captures contextual representations (i.e., semantic and syntactic linguistic features), while TDA captures the shape and structure of data (i.e., linguistic structures). Finally, TopFormer, outperforms all baselines in all 3 datasets, achieving up to 7\% increase in Macro F1 score. Our code and datasets are available at: https://github.com/AdaUchendu/topformer

TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

TL;DR

Abstract

of our backbone as input. This Transformer-based model captures contextual representations (i.e., semantic and syntactic linguistic features), while TDA captures the shape and structure of data (i.e., linguistic structures). Finally, TopFormer, outperforms all baselines in all 3 datasets, achieving up to 7\% increase in Macro F1 score. Our code and datasets are available at: https://github.com/AdaUchendu/topformer

Paper Structure (25 sections, 4 figures, 8 tables, 1 algorithm)

This paper contains 25 sections, 4 figures, 8 tables, 1 algorithm.

Related Work
Authorship Attribution of Deepfake Texts
TDA Applications in NLP
Topological Data Analysis (TDA) Features
TopFormer: Topology-Aware Attributor
Experiments
Datasets
OpenLLMText
SynSciPass
Mixset
Authorship Attribution Models
Results
Further Analysis
Deepfake Text Style Detection
Alternative Transformer-based Backbone
...and 10 more sections

Figures (4)

Figure 1: Illustration of the Authorship Attribution (AA) problem with multiple authors - human and many deepfake (LLM) authors.
Figure 2: Flowchart of the Topological classification algorithm. The Red frame indicates our methodology and technique to transform a Vanilla Transformer-based model to a Topological Transformer-based model.
Figure 3: Illustration of how we extract the TDA features using the reshaped Transformer-based model's regularized weights as input. First, we reshape the regularized $pooled\_output$ from $1 \times 768$ dimensions to $24 \times 32$ and use this 2D matrix as input for the Topological layer. The Topological layer treats this 2D matrix as a point cloud plot and extracts TDA features ($birth$ & $death$). Next, these TDA features are plotted in a figure known as Persistent Diagram, where the $birth$ features are on the $x$-axis and $death$ features are on the $y$-axis. While we plot the features from the 0-Dimension (connected components) and 1-Dimension (loops), only 0-Dimension features are used for our task.
Figure 4: PCA plots from RoBERTa and TopFormer training embeddings for the SynSciPass and Mixset datasets on the classification tasks. The black clusters are the human labels and the other clusters are the deepfake text labels.

TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

TL;DR

Abstract

TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

TL;DR

Abstract

Table of Contents

Figures (4)