On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan, Chengyu Wang, Cen Chen, Yang Liu, Jun Huang
TL;DR
This survey analyzes the trustworthiness of diffusion models and large language models across privacy, security, fairness, and responsibility. It integrates DMs and LLMs within a unified framework, highlighting how data leakage, adversarial/backdoor threats, bias, and accountability concerns manifest across training and inference, and across modalities. It provides a taxonomy and practical recommendations (e.g., localized training, DP, watermarking, alignments) along with research directions for robust, trustworthy AI systems. By mapping threats to concrete benchmarks and defenses, the work aims to guide industry deployment and future research toward safer, more accountable generative AI. The findings underscore the need for formal verification, cross-domain benchmarks, and governance to balance utility with safety and societal impact.
Abstract
Diffusion models and large language models have emerged as leading-edge generative models, revolutionizing various aspects of human life. However, the practical implementations of these models have also exposed inherent risks, bringing to the forefront their evil sides and sparking concerns regarding their trustworthiness. Despite the wealth of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, this paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: 1) privacy, 2) security, 3) fairness, and 4) responsibility. Based on the investigation results, we develop an extensive map outlining the trustworthiness of large generative models. After that, we provide practical recommendations and potential research directions for future secure applications equipped with large generative models, ultimately promoting the trustworthiness of the models and benefiting the society as a whole.
