Table of Contents
Fetching ...

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

Shail Desai, Aditya Pawar, Li Lin, Xin Wang, Shu Hu

TL;DR

SynthGuard presents an open, multimodal platform for detecting AI-generated media by unifying traditional detectors with multimodal LLM-based reasoning. It delivers explainable inferences, supports both image and audio modalities, and provides a researcher-friendly interface with credit-based access and transparent analytics. The system combines a modular backend (FastAPI, PyTorch detectors, and LLM modules) with a React frontend and a MySQL data store, enabling reproducible, ethical multimodal forensics. While offering strong extensibility and an end-to-end analytic pipeline, it acknowledges limitations in video detection and domain-generalization of MLLMs, outlining concrete future work toward video/text detectors and cloud-native deployment for scalability.

Abstract

Artificial Intelligence (AI) has made it possible for anyone to create images, audio, and video with unprecedented ease, enriching education, communication, and creative expression. At the same time, the rapid rise of AI-generated media has introduced serious risks, including misinformation, identity misuse, and the erosion of public trust as synthetic content becomes increasingly indistinguishable from real media. Although deepfake detection has advanced, many existing tools remain closed-source, limited in modality, or lacking transparency and educational value, making it difficult for users to understand how detection decisions are made. To address these gaps, we introduce SynthGuard, an open, user-friendly platform for detecting and analyzing AI-generated multimedia using both traditional detectors and multimodal large language models (MLLMs). SynthGuard provides explainable inference, unified image and audio support, and an interactive interface designed to make forensic analysis accessible to researchers, educators, and the public. The SynthGuard platform is available at: https://in-engr-nova.it.purdue.edu/

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

TL;DR

SynthGuard presents an open, multimodal platform for detecting AI-generated media by unifying traditional detectors with multimodal LLM-based reasoning. It delivers explainable inferences, supports both image and audio modalities, and provides a researcher-friendly interface with credit-based access and transparent analytics. The system combines a modular backend (FastAPI, PyTorch detectors, and LLM modules) with a React frontend and a MySQL data store, enabling reproducible, ethical multimodal forensics. While offering strong extensibility and an end-to-end analytic pipeline, it acknowledges limitations in video detection and domain-generalization of MLLMs, outlining concrete future work toward video/text detectors and cloud-native deployment for scalability.

Abstract

Artificial Intelligence (AI) has made it possible for anyone to create images, audio, and video with unprecedented ease, enriching education, communication, and creative expression. At the same time, the rapid rise of AI-generated media has introduced serious risks, including misinformation, identity misuse, and the erosion of public trust as synthetic content becomes increasingly indistinguishable from real media. Although deepfake detection has advanced, many existing tools remain closed-source, limited in modality, or lacking transparency and educational value, making it difficult for users to understand how detection decisions are made. To address these gaps, we introduce SynthGuard, an open, user-friendly platform for detecting and analyzing AI-generated multimedia using both traditional detectors and multimodal large language models (MLLMs). SynthGuard provides explainable inference, unified image and audio support, and an interactive interface designed to make forensic analysis accessible to researchers, educators, and the public. The SynthGuard platform is available at: https://in-engr-nova.it.purdue.edu/

Paper Structure

This paper contains 23 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: SynthGuard UI overview.
  • Figure 2: An overview of important pages. (a) MLLM-agnostic detector page and credits indicator. (b) MLLM-aware detector page. (c) Result page for image test. (d) MLLM-aware detector result. (e) Statistics page. (f) Feedback page.
  • Figure 3: SynthGuard platform architecture. The system operates end-to-end across a full-stack pipeline: (1) Users interact with the React v18 frontend over HTTPS/JSON to upload images or audio and manage accounts; (2) Requests are routed through an NGINX reverse proxy to the FastAPI backend hosted on a Linux server (Ubuntu 20.04 with NVIDIA RTX A6000); (3) The backend invokes integrated multimodal LLM-aware detectors and model APIs to analyze submitted media; (4) Results, predictions, and metadata are returned to the frontend; (5) User information, credit balances, and inference logs are persisted in a MySQL database.