Table of Contents
Fetching ...

Trustworthy Large Models in Vision: A Survey

Ziyan Guo, Li Xu, Jun Liu

TL;DR

This survey tackles trustworthy large vision models by examining four primary concerns—human misuse, vulnerabilities, inherent issues, and interpretability—and detailing concrete challenges, defenses, and open questions across deepfake/NSFW risks, poisoning/backdoor/adversarial attacks, copyright/privacy/bias/hallucination, and interpretability. It surveys state-of-the-art methods for detection, data curation, training-time defenses, and evaluation benchmarks (e.g., APBench, CHAIR, POPE, MMHAL-BENCH, HaELM), while highlighting the arms race between attackers and defenders and the need for lifecycle-aware, multi-criteria safety solutions. The work clarifies the current gaps in robustness and the limitations of existing defenses, and it positions trustworthiness as essential for the responsible deployment of vision foundation models in real-world settings. Overall, the paper aims to guide researchers and practitioners toward aligned, safe, and reliable vision-language systems with practical implications for privacy, copyright, bias mitigation, and factual reliability in downstream tasks.

Abstract

The rapid progress of Large Models (LMs) has recently revolutionized various fields of deep learning with remarkable grades, ranging from Natural Language Processing (NLP) to Computer Vision (CV). However, LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior, which urgently needs to be alleviated by reliable methods. Despite the abundance of literature on trustworthy LMs in NLP, a systematic survey specifically delving into the trustworthiness of LMs in CV remains absent. In order to mitigate this gap, we summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability. By highlighting corresponding challenge, countermeasures, and discussion in each topic, we hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.

Trustworthy Large Models in Vision: A Survey

TL;DR

This survey tackles trustworthy large vision models by examining four primary concerns—human misuse, vulnerabilities, inherent issues, and interpretability—and detailing concrete challenges, defenses, and open questions across deepfake/NSFW risks, poisoning/backdoor/adversarial attacks, copyright/privacy/bias/hallucination, and interpretability. It surveys state-of-the-art methods for detection, data curation, training-time defenses, and evaluation benchmarks (e.g., APBench, CHAIR, POPE, MMHAL-BENCH, HaELM), while highlighting the arms race between attackers and defenders and the need for lifecycle-aware, multi-criteria safety solutions. The work clarifies the current gaps in robustness and the limitations of existing defenses, and it positions trustworthiness as essential for the responsible deployment of vision foundation models in real-world settings. Overall, the paper aims to guide researchers and practitioners toward aligned, safe, and reliable vision-language systems with practical implications for privacy, copyright, bias mitigation, and factual reliability in downstream tasks.

Abstract

The rapid progress of Large Models (LMs) has recently revolutionized various fields of deep learning with remarkable grades, ranging from Natural Language Processing (NLP) to Computer Vision (CV). However, LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior, which urgently needs to be alleviated by reliable methods. Despite the abundance of literature on trustworthy LMs in NLP, a systematic survey specifically delving into the trustworthiness of LMs in CV remains absent. In order to mitigate this gap, we summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability. By highlighting corresponding challenge, countermeasures, and discussion in each topic, we hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.
Paper Structure (16 sections, 11 figures, 6 tables)

This paper contains 16 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: The landscape of trustworthy large models in vision.
  • Figure 2: Examples of deepfake and NSFW content, showing fake celebrity and violent crime, which are generated by Stable Diffusion XL v1.0 in November 2023. The prompt of generating NSFW content is adapted from the previous work Schramowski_2023_CVPR.
  • Figure 3: The general process of adding watermarks to the diffusion model and then the watermarks can be extracted from generated images by detector with extremely low error rate. The figure is adapted from the work zhao2023recipe.
  • Figure 4: Difference in output between original and fine-tuned models, indicating that fine-tuned model will proactively erase concept of nudity. The figure is adapted from the work gandikota2023erasing.
  • Figure 5: The summary of using DMs to deal with backdoor attacks by restoring the poisoning image. The figure is adapted from the work shi2023blackbox.
  • ...and 6 more figures