Reliable and Responsible Foundation Models: A Comprehensive Survey

Xinyu Yang; Junlin Han; Rishi Bommasani; Jinqi Luo; Wenjie Qu; Wangchunshu Zhou; Adel Bibi; Xiyao Wang; Jaehong Yoon; Elias Stengel-Eskin; Shengbang Tong; Lingfeng Shen; Rafael Rafailov; Runjia Li; Zhaoyang Wang; Yiyang Zhou; Chenhang Cui; Yu Wang; Wenhao Zheng; Huichi Zhou; Jindong Gu; Zhaorun Chen; Peng Xia; Tony Lee; Thomas Zollo; Vikash Sehwag; Jixuan Leng; Jiuhai Chen; Yuxin Wen; Huan Zhang; Zhun Deng; Linjun Zhang; Pavel Izmailov; Pang Wei Koh; Yulia Tsvetkov; Andrew Wilson; Jiaheng Zhang; James Zou; Cihang Xie; Hao Wang; Philip Torr; Julian McAuley; David Alvarez-Melis; Florian Tramèr; Kaidi Xu; Suman Jana; Chris Callison-Burch; Rene Vidal; Filippos Kokkinos; Mohit Bansal; Beidi Chen; Huaxiu Yao

Reliable and Responsible Foundation Models: A Comprehensive Survey

Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, Shengbang Tong, Lingfeng Shen, Rafael Rafailov, Runjia Li, Zhaoyang Wang, Yiyang Zhou, Chenhang Cui, Yu Wang, Wenhao Zheng, Huichi Zhou, Jindong Gu, Zhaorun Chen, Peng Xia, Tony Lee, Thomas Zollo, Vikash Sehwag, Jixuan Leng, Jiuhai Chen, Yuxin Wen, Huan Zhang, Zhun Deng, Linjun Zhang, Pavel Izmailov, Pang Wei Koh, Yulia Tsvetkov, Andrew Wilson, Jiaheng Zhang, James Zou, Cihang Xie, Hao Wang, Philip Torr, Julian McAuley, David Alvarez-Melis, Florian Tramèr, Kaidi Xu, Suman Jana, Chris Callison-Burch, Rene Vidal, Filippos Kokkinos, Mohit Bansal, Beidi Chen, Huaxiu Yao

TL;DR

This comprehensive survey analyzes reliable and responsible foundation models across nine dimensions (bias and fairness, alignment, security, privacy, hallucination, uncertainty, distribution shift, explainability, and AIGC detection) for four core families (LLMs, MLLMs, image and video generative models). It clarifies definitions, surveys state-of-the-art methods, and outlines concrete future directions, emphasizing interactions and trade-offs among dimensions and modalities. The work highlights data- and model-centric approaches (SFT, RLHF, instruction tuning, visual instruction tuning), multi-modal alignment challenges, and cross-cutting concerns such as risk, ethics, and governance. By connecting mechanisms like retrieval-augmented generation, diffusion-based image synthesis, and watermark-based AIGC detection, the survey provides a holistic blueprint for building reliable, robust, and socially responsible foundation models with practical implications for industry, academia, and policy. The findings underscore the importance of cross-disciplinary research, standardized evaluation, and continuous auditing to ensure foundation models remain powerful yet trustworthy as they scale and permeate real-world domains.

Abstract

Foundation models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), Image Generative Models (i.e, Text-to-Image Models and Image-Editing Models), and Video Generative Models, have become essential tools with broad applications across various domains such as law, medicine, education, finance, science, and beyond. As these models see increasing real-world deployment, ensuring their reliability and responsibility has become critical for academia, industry, and government. This survey addresses the reliable and responsible development of foundation models. We explore critical issues, including bias and fairness, security and privacy, uncertainty, explainability, and distribution shift. Our research also covers model limitations, such as hallucinations, as well as methods like alignment and Artificial Intelligence-Generated Content (AIGC) detection. For each area, we review the current state of the field and outline concrete future research directions. Additionally, we discuss the intersections between these areas, highlighting their connections and shared challenges. We hope our survey fosters the development of foundation models that are not only powerful but also ethical, trustworthy, reliable, and socially responsible.

Reliable and Responsible Foundation Models: A Comprehensive Survey

TL;DR

Abstract

Paper Structure (127 sections, 14 equations, 31 figures, 3 tables)

This paper contains 127 sections, 14 equations, 31 figures, 3 tables.

Introduction
Types of Foundation Models
Bias and Fairness
Definitions
Methods for Bias Evaluation
Methods for Bias Mitigation
Bias and Fairness in MLLMs
Bias and Fairness in Image Generative Models
Current Limitations and Future Directions
Limitations and Open Challenges of Bias and Fairness
Future Directions
Alignment
Supervised Fine-Tuning
Reinforcement Learning from Human Feedback
Prompt Engineering
...and 112 more sections

Figures (31)

Figure 1: Overview of reliable and responsible foundation models. This survey comprehensively summarizes existing research from nine critical dimensions: bias and fairness, alignment, security, privacy, hallucination, uncertainty, distribution shift, explainability, and Artificial Intelligence-Generated Content (AIGC) detection. We organize foundation models into four categories, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), Image Generative Models, and Video Generative Models, to illustrate how each category uniquely interacts with these dimensions. Additionally, we explore how these dimensions interact and reinforce one another to highlight their synergies and shared challenges.
Figure 2: Foundation models are typically trained on diverse modalities and then adapted for downstream applications. Throughout this pipeline, various reliable and responsible issues emerge at different stages.
Figure 3: Taxonomy of Bias and Fairness in Foundation Models.
Figure 4: An example of gender bias in LLM responses.
Figure 5: An overview of strategies for evaluating and mitigating bias in LLMs, covering evaluation via feature embedding, generated text, and token selection probability and mitigation during training or inference.
...and 26 more figures

Reliable and Responsible Foundation Models: A Comprehensive Survey

TL;DR

Abstract

Reliable and Responsible Foundation Models: A Comprehensive Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (31)