Research about the Ability of LLM in the Tamper-Detection Area

Xinyu Yang; Jizhe Zhou

Research about the Ability of LLM in the Tamper-Detection Area

Xinyu Yang, Jizhe Zhou

TL;DR

This work assesses whether large language models can assist in tamper detection amid rising AI-generated content and sophisticated image forgery. By evaluating five LLMs (GPT-4, LLaVA, Bard, ERNIE Bot4, Tongyi Qianwen) on two tasks—AI-generated image detection and tamper detection—the study uses 100 AI-generated and 100 manipulated images and per-image chats to determine accuracy. The findings show limited effectiveness overall: only GPT-4 reaches up to about 70% accuracy on a random subset, while all models struggle with visually realistic tampering and deepfakes. These results highlight the current limitations of LLMs in tamper detection and reinforce the need to rely on traditional detection methods and continued DL-based research for robust security solutions.

Abstract

In recent years, particularly since the early 2020s, Large Language Models (LLMs) have emerged as the most powerful AI tools in addressing a diverse range of challenges, from natural language processing to complex problem-solving in various domains. In the field of tamper detection, LLMs are capable of identifying basic tampering activities.To assess the capabilities of LLMs in more specialized domains, we have collected five different LLMs developed by various companies: GPT-4, LLaMA, Bard, ERNIE Bot 4.0, and Tongyi Qianwen. This diverse range of models allows for a comprehensive evaluation of their performance in detecting sophisticated tampering instances.We devised two domains of detection: AI-Generated Content (AIGC) detection and manipulation detection. AIGC detection aims to test the ability to distinguish whether an image is real or AI-generated. Manipulation detection, on the other hand, focuses on identifying tampered images. According to our experiments, most LLMs can identify composite pictures that are inconsistent with logic, and only more powerful LLMs can distinguish logical, but visible signs of tampering to the human eye. All of the LLMs can't identify carefully forged images and very realistic images generated by AI. In the area of tamper detection, LLMs still have a long way to go, particularly in reliably identifying highly sophisticated forgeries and AI-generated images that closely mimic reality.

Research about the Ability of LLM in the Tamper-Detection Area

TL;DR

Abstract

Paper Structure (13 sections, 4 figures, 1 table)

This paper contains 13 sections, 4 figures, 1 table.

Introduction
Five Main LLMs
GPT-4
LLaVA
Bard
ERNIE Bot4
Tongyi Qianwen
Design of the Test
Process of the Experiment
Dataset of Modified Images
Dataset of AI Generated Image
Results of the Experiments
Conclusion

Figures (4)

Figure 1: The logos of the five LLMs. We can see an overview of them.
Figure 2: An experimental process: Open a chat for each image and ask the language model to classify and explain its reasoning. By evaluating the explanations and counting the correct ones, we can determine the model's accuracy.
Figure 3: Examples from the NIST16 dataset: On the left are simple fake images, which can be easily identified as tampered. On the right are deepfake images, making it challenging to distinguish whether they are real or altered.
Figure 4: Examples from the FFHQ dataset. These images are so lifelike that it's challenging to discern whether they are real photographs or AI-generated.

Research about the Ability of LLM in the Tamper-Detection Area

TL;DR

Abstract

Research about the Ability of LLM in the Tamper-Detection Area

Authors

TL;DR

Abstract

Table of Contents

Figures (4)