Trustworthy Large Models in Vision: A Survey

Ziyan Guo; Li Xu; Jun Liu

Trustworthy Large Models in Vision: A Survey

Ziyan Guo, Li Xu, Jun Liu

TL;DR

This survey tackles trustworthy large vision models by examining four primary concerns—human misuse, vulnerabilities, inherent issues, and interpretability—and detailing concrete challenges, defenses, and open questions across deepfake/NSFW risks, poisoning/backdoor/adversarial attacks, copyright/privacy/bias/hallucination, and interpretability. It surveys state-of-the-art methods for detection, data curation, training-time defenses, and evaluation benchmarks (e.g., APBench, CHAIR, POPE, MMHAL-BENCH, HaELM), while highlighting the arms race between attackers and defenders and the need for lifecycle-aware, multi-criteria safety solutions. The work clarifies the current gaps in robustness and the limitations of existing defenses, and it positions trustworthiness as essential for the responsible deployment of vision foundation models in real-world settings. Overall, the paper aims to guide researchers and practitioners toward aligned, safe, and reliable vision-language systems with practical implications for privacy, copyright, bias mitigation, and factual reliability in downstream tasks.

Abstract

The rapid progress of Large Models (LMs) has recently revolutionized various fields of deep learning with remarkable grades, ranging from Natural Language Processing (NLP) to Computer Vision (CV). However, LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior, which urgently needs to be alleviated by reliable methods. Despite the abundance of literature on trustworthy LMs in NLP, a systematic survey specifically delving into the trustworthiness of LMs in CV remains absent. In order to mitigate this gap, we summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability. By highlighting corresponding challenge, countermeasures, and discussion in each topic, we hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.

Trustworthy Large Models in Vision: A Survey

TL;DR

Abstract

Paper Structure (16 sections, 11 figures, 6 tables)

This paper contains 16 sections, 11 figures, 6 tables.

Introduction
Comparison With Existing Related Surveys
Human Misuse
Deepfake
NSFW Content
Vulnerability
Poisoning Attacks
Backdoor Attacks
Adversarial Attacks
Inherent Issue
Copyright
Privacy
Bias
Hallucination
Interpretability
...and 1 more sections

Figures (11)

Figure 1: The landscape of trustworthy large models in vision.
Figure 2: Examples of deepfake and NSFW content, showing fake celebrity and violent crime, which are generated by Stable Diffusion XL v1.0 in November 2023. The prompt of generating NSFW content is adapted from the previous work Schramowski_2023_CVPR.
Figure 3: The general process of adding watermarks to the diffusion model and then the watermarks can be extracted from generated images by detector with extremely low error rate. The figure is adapted from the work zhao2023recipe.
Figure 4: Difference in output between original and fine-tuned models, indicating that fine-tuned model will proactively erase concept of nudity. The figure is adapted from the work gandikota2023erasing.
Figure 5: The summary of using DMs to deal with backdoor attacks by restoring the poisoning image. The figure is adapted from the work shi2023blackbox.
...and 6 more figures

Trustworthy Large Models in Vision: A Survey

TL;DR

Abstract

Trustworthy Large Models in Vision: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (11)