Table of Contents
Fetching ...

Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics

Pankaj Kumar, Subhankar Mishra

TL;DR

The paper surveys robustness in large language models (LLMs), defining robustness as reliable performance under input perturbations, distribution shifts, and adversarial conditions beyond standard accuracy. It systematically characterizes seven interdependent dimensions of robustness, analyzes data, training, architectural, and inference-related sources of non-robustness, and reviews a wide array of mitigation strategies spanning pre-processing, in-processing, intra-processing, and post-processing. The authors also catalog metrics and benchmarks for evaluating robustness across adversarial, OOD, consistency, fairness, and task-specific dimensions, and discuss major challenges and future directions, including scalable defenses, causal understanding, and adaptive evaluation. By integrating cross-domain insights and outlining practical mitigation and evaluation frameworks, the survey aims to accelerate the development of trustworthy, robust LLMs for real-world deployment.

Abstract

Large Language Models (LLMs) have emerged as a promising cornerstone for the development of natural language processing (NLP) and artificial intelligence (AI). However, ensuring the robustness of LLMs remains a critical challenge. To address these challenges and advance the field, this survey provides a comprehensive overview of current studies in this area. First, we systematically examine the nature of robustness in LLMs, including its conceptual foundations, the importance of consistent performance across diverse inputs, and the implications of failure modes in real-world applications. Next, we analyze the sources of non-robustness, categorizing intrinsic model limitations, data-driven vulnerabilities, and external adversarial factors that compromise reliability. Following this, we review state-of-the-art mitigation strategies, and then we discuss widely adopted benchmarks, emerging metrics, and persistent gaps in assessing real-world reliability. Finally, we synthesize findings from existing surveys and interdisciplinary studies to highlight trends, unresolved issues, and pathways for future research.

Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics

TL;DR

The paper surveys robustness in large language models (LLMs), defining robustness as reliable performance under input perturbations, distribution shifts, and adversarial conditions beyond standard accuracy. It systematically characterizes seven interdependent dimensions of robustness, analyzes data, training, architectural, and inference-related sources of non-robustness, and reviews a wide array of mitigation strategies spanning pre-processing, in-processing, intra-processing, and post-processing. The authors also catalog metrics and benchmarks for evaluating robustness across adversarial, OOD, consistency, fairness, and task-specific dimensions, and discuss major challenges and future directions, including scalable defenses, causal understanding, and adaptive evaluation. By integrating cross-domain insights and outlining practical mitigation and evaluation frameworks, the survey aims to accelerate the development of trustworthy, robust LLMs for real-world deployment.

Abstract

Large Language Models (LLMs) have emerged as a promising cornerstone for the development of natural language processing (NLP) and artificial intelligence (AI). However, ensuring the robustness of LLMs remains a critical challenge. To address these challenges and advance the field, this survey provides a comprehensive overview of current studies in this area. First, we systematically examine the nature of robustness in LLMs, including its conceptual foundations, the importance of consistent performance across diverse inputs, and the implications of failure modes in real-world applications. Next, we analyze the sources of non-robustness, categorizing intrinsic model limitations, data-driven vulnerabilities, and external adversarial factors that compromise reliability. Following this, we review state-of-the-art mitigation strategies, and then we discuss widely adopted benchmarks, emerging metrics, and persistent gaps in assessing real-world reliability. Finally, we synthesize findings from existing surveys and interdisciplinary studies to highlight trends, unresolved issues, and pathways for future research.

Paper Structure

This paper contains 65 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The outline of the survey on robustness of LLMs.
  • Figure 2: PRISMA flow diagram of the survey methodology
  • Figure 3: A conceptual visualization of the interdependent dimensions critical to LLM robustness, highlighting the synergies and tensions between improving OOD generalization, noise resilience, output consistency, and fairness, and so highlighting the challenge of achieving well-balanced, comprehensive robustness across all dimensions.
  • Figure 4: An illustration of some failure cases of non-robustness (adapted from schulhoff2025promptreportsystematicsurveyshen-etal-2024-assessinggallegos2024biasfairnesslargelanguagebano2025doessoftwareengineerlookzhou-etal-2024-explore). The examples are representative of behaviors primarily observed in GPT-3.5, and are intended to demonstrate common categories of robustness failures.
  • Figure 5: Comprehensive LLM Robustness Pipeline. A multi-stage framework for enhancing LLM robustness across the deployment lifecycle. (1) Pre-processing: Data curation and augmentation strategies; (2) In-processing: Training-time interventions including adversarial training and alignment; (3) Intra-processing: Real-time inference adaptations; (4) Post-processing: Output validation and filtering. Arrows indicate data flow and feedback mechanisms between stages.