Table of Contents
Fetching ...

Machine Learning Robustness: A Primer

Houssem Ben Braiek, Foutse Khomh

TL;DR

This primer defines ML robustness as stability of predictive performance under data changes and situates it as a core component of trustworthy AI, complementing generalization and uncertainty quantification. It surveys robustness assessment methods, including adversarial and non-adversarial shifts, DL testing, and empirical limitations, and outlines a spectrum of amelioration strategies spanning data-centric, model-centric, and post-training techniques. Key contributions include a taxonomy of attack types (white-box, black-box, physical), robustness metrics (e.g., $R_f^{\phi}$, $mCE$, $rCE$), and practical guidance on bridging theory with engineering practice through domain-aware testing and human-in-the-loop approaches. The discussion highlights open challenges such as data bias, underspecification, and the lack of universal guarantees, while pointing to future directions in generative test-case synthesis, domain-aware stress testing, and repair-oriented post-training methods for safer, more reliable AI systems.

Abstract

This chapter explores the foundational concept of robustness in Machine Learning (ML) and its integral role in establishing trustworthiness in Artificial Intelligence (AI) systems. The discussion begins with a detailed definition of robustness, portraying it as the ability of ML models to maintain stable performance across varied and unexpected environmental conditions. ML robustness is dissected through several lenses: its complementarity with generalizability; its status as a requirement for trustworthy AI; its adversarial vs non-adversarial aspects; its quantitative metrics; and its indicators such as reproducibility and explainability. The chapter delves into the factors that impede robustness, such as data bias, model complexity, and the pitfalls of underspecified ML pipelines. It surveys key techniques for robustness assessment from a broad perspective, including adversarial attacks, encompassing both digital and physical realms. It covers non-adversarial data shifts and nuances of Deep Learning (DL) software testing methodologies. The discussion progresses to explore amelioration strategies for bolstering robustness, starting with data-centric approaches like debiasing and augmentation. Further examination includes a variety of model-centric methods such as transfer learning, adversarial training, and randomized smoothing. Lastly, post-training methods are discussed, including ensemble techniques, pruning, and model repairs, emerging as cost-effective strategies to make models more resilient against the unpredictable. This chapter underscores the ongoing challenges and limitations in estimating and achieving ML robustness by existing approaches. It offers insights and directions for future research on this crucial concept, as a prerequisite for trustworthy AI systems.

Machine Learning Robustness: A Primer

TL;DR

This primer defines ML robustness as stability of predictive performance under data changes and situates it as a core component of trustworthy AI, complementing generalization and uncertainty quantification. It surveys robustness assessment methods, including adversarial and non-adversarial shifts, DL testing, and empirical limitations, and outlines a spectrum of amelioration strategies spanning data-centric, model-centric, and post-training techniques. Key contributions include a taxonomy of attack types (white-box, black-box, physical), robustness metrics (e.g., , , ), and practical guidance on bridging theory with engineering practice through domain-aware testing and human-in-the-loop approaches. The discussion highlights open challenges such as data bias, underspecification, and the lack of universal guarantees, while pointing to future directions in generative test-case synthesis, domain-aware stress testing, and repair-oriented post-training methods for safer, more reliable AI systems.

Abstract

This chapter explores the foundational concept of robustness in Machine Learning (ML) and its integral role in establishing trustworthiness in Artificial Intelligence (AI) systems. The discussion begins with a detailed definition of robustness, portraying it as the ability of ML models to maintain stable performance across varied and unexpected environmental conditions. ML robustness is dissected through several lenses: its complementarity with generalizability; its status as a requirement for trustworthy AI; its adversarial vs non-adversarial aspects; its quantitative metrics; and its indicators such as reproducibility and explainability. The chapter delves into the factors that impede robustness, such as data bias, model complexity, and the pitfalls of underspecified ML pipelines. It surveys key techniques for robustness assessment from a broad perspective, including adversarial attacks, encompassing both digital and physical realms. It covers non-adversarial data shifts and nuances of Deep Learning (DL) software testing methodologies. The discussion progresses to explore amelioration strategies for bolstering robustness, starting with data-centric approaches like debiasing and augmentation. Further examination includes a variety of model-centric methods such as transfer learning, adversarial training, and randomized smoothing. Lastly, post-training methods are discussed, including ensemble techniques, pruning, and model repairs, emerging as cost-effective strategies to make models more resilient against the unpredictable. This chapter underscores the ongoing challenges and limitations in estimating and achieving ML robustness by existing approaches. It offers insights and directions for future research on this crucial concept, as a prerequisite for trustworthy AI systems.
Paper Structure (38 sections, 8 equations)