AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

Jonas Ricker; Dennis Assenmacher; Thorsten Holz; Asja Fischer; Erwin Quiring

AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

Jonas Ricker, Dennis Assenmacher, Thorsten Holz, Asja Fischer, Erwin Quiring

TL;DR

This paper investigates the real-world prevalence and usage of AI-generated profile images on Twitter using a carefully designed, multi-stage detection pipeline trained on Twitter-processed data. It reports that about $0.052\%$ of nearly 15 million profile pictures are AI-generated, identifies 7,723 fake-image accounts, and reveals coordinated inauthentic behavior including spamming, cryptocurrency giveaways, and political discourse. The methodology combines a fast pre-filter, a CNN classifier tuned to processed profile images, manual labeling aided by alignment and GAN inversion, and multiple labeled data sources to estimate error rates. The analysis of accounts and tweets shows fake-image users tend to have fewer followers, shorter lifespans, higher suspension rates, and form large clusters with homogeneous patterns, indicating orchestrated networks. The work provides a scalable framework for real-world detection and contributes to the design of mitigation strategies, data and code release, and insights into the threats and topics tied to AI-generated social media content.

Abstract

Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking. In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.

AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

TL;DR

of nearly 15 million profile pictures are AI-generated, identifies 7,723 fake-image accounts, and reveals coordinated inauthentic behavior including spamming, cryptocurrency giveaways, and political discourse. The methodology combines a fast pre-filter, a CNN classifier tuned to processed profile images, manual labeling aided by alignment and GAN inversion, and multiple labeled data sources to estimate error rates. The analysis of accounts and tweets shows fake-image users tend to have fewer followers, shorter lifespans, higher suspension rates, and form large clusters with homogeneous patterns, indicating orchestrated networks. The work provides a scalable framework for real-world detection and contributes to the design of mitigation strategies, data and code release, and insights into the threats and topics tied to AI-generated social media content.

Abstract

Paper Structure (71 sections, 1 equation, 14 figures, 5 tables)

This paper contains 71 sections, 1 equation, 14 figures, 5 tables.

Introduction
Contributions
Background
AI-Generated Content (AIGC)
Image Synthesis
Generated Image Detection
Methodology
Data Collection
In-The-Wild Dataset $\mathcal{D}_W^{\mathbb{X}}$
Labeled Datasets $\mathcal{D}_R$/$\mathcal{D}_F$ and Variations
Proxy-Labeled Real Dataset $\mathcal{D}_P^\mathbb{X}$
Documented Fakes Dataset $\mathcal{D}_D^\mathbb{X}$
Detection
Pre-Filter $\phi$
Classifier $\mathcal{C}$
...and 56 more sections

Figures (14)

Figure 1: Evaluation of our classifier $\mathcal{C}_{R^\mathbb{X}, P^\mathbb{X} / F^\mathbb{X}}$. We show the ROC curve under different conditions.
Figure 2: Evaluation of GAN inversion. The lower the LPIPS distance between the original image and its reconstruction, the more similar they are.
Figure 3: Precision-recall curve of $\mathcal{C}_{R^\mathbb{X}, P^\mathbb{X} / F^\mathbb{X}}$ on the validation set of the manually labeled images. The circle marks the selected threshold, which maximizes the F1-score.
Figure 4: Score distribution of manually labeled images.
Figure 5: Examples of fake images falsely classified as real, together with their classification score.
...and 9 more figures

AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

TL;DR

Abstract

AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

Authors

TL;DR

Abstract

Table of Contents

Figures (14)