Table of Contents
Fetching ...

Building Trustworthy AI for Materials Discovery: From Autonomous Laboratories to Z-scores

Benhour Amirian, Ashley S. Dale, Sergei Kalinin, Jason Hattrick-Simpers

TL;DR

The paper proposes the GIFTERS framework to evaluate trustworthiness in AI-driven materials discovery, connecting generalizability, interpretability, fairness, transparency, explainability, robustness, and stability with uncertainty quantification. Through a literature review of 63 studies, it shows most work addresses only a subset of GIFTERS, with generalizability commonly reported but transparency and fairness often lacking. It also analyzes Bayesian and non-Bayesian approaches, highlighting gaps and proposing cross-domain methods from healthcare, climate science, and NLP to improve trust. Finally, it outlines future directions including physics-informed learning, human-in-the-loop governance, and robust evaluation practices to ensure AI accelerates discovery while meeting community norms.

Abstract

Accelerated material discovery increasingly relies on artificial intelligence and machine learning, collectively termed "AI/ML". A key challenge in using AI is ensuring that human scientists trust the models are valid and reliable. Accordingly, we define a trustworthy AI framework GIFTERS for materials science and discovery to evaluate whether reported machine learning methods are generalizable, interpretable, fair, transparent, explainable, robust, and stable. Through a critical literature review, we highlight that these are the trustworthiness principles most valued by the materials discovery community. However, we also find that comprehensive approaches to trustworthiness are rarely reported; this is quantified by a median GIFTERS score of 5/7. We observe that Bayesian studies frequently omit fair data practices, while non-Bayesian studies most frequently omit interpretability. Finally, we identify approaches for improving trustworthiness methods in artificial intelligence and machine learning for materials science by considering work accomplished in other scientific disciplines such as healthcare, climate science, and natural language processing with an emphasis on methods that may transfer to materials discovery experiments. By combining these observations, we highlight the necessity of human-in-the-loop, and integrated approaches to bridge the gap between trustworthiness and uncertainty quantification for future directions of materials science research. This ensures that AI/ML methods not only accelerate discovery, but also meet ethical and scientific norms established by the materials discovery community. This work provides a road map for developing trustworthy artificial intelligence systems that will accurately and confidently enable material discovery.

Building Trustworthy AI for Materials Discovery: From Autonomous Laboratories to Z-scores

TL;DR

The paper proposes the GIFTERS framework to evaluate trustworthiness in AI-driven materials discovery, connecting generalizability, interpretability, fairness, transparency, explainability, robustness, and stability with uncertainty quantification. Through a literature review of 63 studies, it shows most work addresses only a subset of GIFTERS, with generalizability commonly reported but transparency and fairness often lacking. It also analyzes Bayesian and non-Bayesian approaches, highlighting gaps and proposing cross-domain methods from healthcare, climate science, and NLP to improve trust. Finally, it outlines future directions including physics-informed learning, human-in-the-loop governance, and robust evaluation practices to ensure AI accelerates discovery while meeting community norms.

Abstract

Accelerated material discovery increasingly relies on artificial intelligence and machine learning, collectively termed "AI/ML". A key challenge in using AI is ensuring that human scientists trust the models are valid and reliable. Accordingly, we define a trustworthy AI framework GIFTERS for materials science and discovery to evaluate whether reported machine learning methods are generalizable, interpretable, fair, transparent, explainable, robust, and stable. Through a critical literature review, we highlight that these are the trustworthiness principles most valued by the materials discovery community. However, we also find that comprehensive approaches to trustworthiness are rarely reported; this is quantified by a median GIFTERS score of 5/7. We observe that Bayesian studies frequently omit fair data practices, while non-Bayesian studies most frequently omit interpretability. Finally, we identify approaches for improving trustworthiness methods in artificial intelligence and machine learning for materials science by considering work accomplished in other scientific disciplines such as healthcare, climate science, and natural language processing with an emphasis on methods that may transfer to materials discovery experiments. By combining these observations, we highlight the necessity of human-in-the-loop, and integrated approaches to bridge the gap between trustworthiness and uncertainty quantification for future directions of materials science research. This ensures that AI/ML methods not only accelerate discovery, but also meet ethical and scientific norms established by the materials discovery community. This work provides a road map for developing trustworthy artificial intelligence systems that will accurately and confidently enable material discovery.

Paper Structure

This paper contains 6 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Overview of various machine learning-based UQ methods. The figure illustrates six different strategies for predicting DFT-calculated adsorption energies ($\Delta E$) and their associated uncertainties, including a standard crystal graph convolutional neural network (CGCNN), Bayesian neural network, in-series NN (NN$\Delta$NN), Gaussian process, dropout NN, and the CFGP. Each method differs in how it estimates prediction mean ($\mu$) and standard deviation ($\sigma$), with the CFGP achievig the best overall performance in terms of calibration, sharpness, and negative log-likelihood. Reproduced with permission tran2020methods. Copyright 2020 IOP Publishing.
  • Figure 2: GIFTERS-based categorization of trustworthiness attributes for Bayesian and non-Bayesian AI/ML papers: generalizability, interpretability, fairness, transparency, robustness, explainability, and stability.
  • Figure 3: Three-step uncertainty-aware workflow. Step 1 applies one of three UQ modules, k-fold ensemble, Monte Carlo dropout, or evidential regression, to generate predictions with error bars. Step 2 scores those outputs with accuracy, sharpness, dispersion, calibration, and tightness metrics. Step 3 charts high versus low trust cases, giving users an assessment of model reliability. Reproduced with permission gruich2023clarifying. Copyright 2023 IOP Publishing.
  • Figure 4: Workflow for multi-objective Bayesian optimization of high-entropy alloys using TS-EMO, illustrating interpretable surrogate modeling, explainable decision acquisition, and generalizability across compositional space. Reproduced with permission startt2024bayesian. Copyright 2024 Nature Publishing Group UK London.
  • Figure 5: Summary of the EDBO framework: (a) reaction space and synthetic complexity for a multistep inhibitor target; (b) mechanistic input, experimental design, and surface modeling; and (c) visualization of optimization trajectory within a high-dimensional design space, balancing exploration and exploitation. Reproduced with permission shields2021bayesian. Copyright 2021 Nature Publishing Group UK London.
  • ...and 11 more figures