Table of Contents
Fetching ...

Confidence Adjusted Surprise Measure for Active Resourceful Trials (CA-SMART): A Data-driven Active Learning Framework for Accelerating Material Discovery under Resource Constraints

Ahmed Shoyeb Raihan, Zhichao Liu, Tanveer Hossain Bhuiyan, Imtiaz Ahmed

TL;DR

This work tackles efficient material discovery under budget and data constraints by introducing CA-SMART, a Bayesian active-learning framework that uses Confidence-Adjusted Surprise (CAS) to adaptively balance exploration and exploitation. CAS integrates Shannon surprise, a flat-prior belief shift, and a model-confidence term to prioritize informative observations that are also reliable, mitigating over-exploration of uncertain regions. Evaluations on synthetic benchmark functions and a steel fatigue-strength dataset show that CA-SMART yields faster convergence and tighter predictive uncertainty (lower RMSE and CRPS) than traditional acquisition functions (EI, UCB, MV, PI) and other surprise-based methods. The approach demonstrates strong data efficiency and robust performance, indicating practical potential for accelerating material discovery under stringent resource constraints across diverse domains.

Abstract

Accelerating the discovery and manufacturing of advanced materials with specific properties is a critical yet formidable challenge due to vast search space, high costs of experiments, and time-intensive nature of material characterization. In recent years, active learning, where a surrogate machine learning (ML) model mimics the scientific discovery process of a human scientist, has emerged as a promising approach to address these challenges by guiding experimentation toward high-value outcomes with a limited budget. Among the diverse active learning philosophies, the concept of surprise (capturing the divergence between expected and observed outcomes) has demonstrated significant potential to drive experimental trials and refine predictive models. Scientific discovery often stems from surprise thereby making it a natural driver to guide the search process. Despite its promise, prior studies leveraging surprise metrics such as Shannon and Bayesian surprise lack mechanisms to account for prior confidence, leading to excessive exploration of uncertain regions that may not yield useful information. To address this, we propose the Confidence-Adjusted Surprise Measure for Active Resourceful Trials (CA-SMART), a novel Bayesian active learning framework tailored for optimizing data-driven experimentation. On a high level, CA-SMART incorporates Confidence-Adjusted Surprise (CAS) to dynamically balance exploration and exploitation by amplifying surprises in regions where the model is more certain while discounting them in highly uncertain areas. We evaluated CA-SMART on two benchmark functions (Six-Hump Camelback and Griewank) and in predicting the fatigue strength of steel. The results demonstrate superior accuracy and efficiency compared to traditional surprise metrics, standard Bayesian Optimization (BO) acquisition functions and conventional ML methods.

Confidence Adjusted Surprise Measure for Active Resourceful Trials (CA-SMART): A Data-driven Active Learning Framework for Accelerating Material Discovery under Resource Constraints

TL;DR

This work tackles efficient material discovery under budget and data constraints by introducing CA-SMART, a Bayesian active-learning framework that uses Confidence-Adjusted Surprise (CAS) to adaptively balance exploration and exploitation. CAS integrates Shannon surprise, a flat-prior belief shift, and a model-confidence term to prioritize informative observations that are also reliable, mitigating over-exploration of uncertain regions. Evaluations on synthetic benchmark functions and a steel fatigue-strength dataset show that CA-SMART yields faster convergence and tighter predictive uncertainty (lower RMSE and CRPS) than traditional acquisition functions (EI, UCB, MV, PI) and other surprise-based methods. The approach demonstrates strong data efficiency and robust performance, indicating practical potential for accelerating material discovery under stringent resource constraints across diverse domains.

Abstract

Accelerating the discovery and manufacturing of advanced materials with specific properties is a critical yet formidable challenge due to vast search space, high costs of experiments, and time-intensive nature of material characterization. In recent years, active learning, where a surrogate machine learning (ML) model mimics the scientific discovery process of a human scientist, has emerged as a promising approach to address these challenges by guiding experimentation toward high-value outcomes with a limited budget. Among the diverse active learning philosophies, the concept of surprise (capturing the divergence between expected and observed outcomes) has demonstrated significant potential to drive experimental trials and refine predictive models. Scientific discovery often stems from surprise thereby making it a natural driver to guide the search process. Despite its promise, prior studies leveraging surprise metrics such as Shannon and Bayesian surprise lack mechanisms to account for prior confidence, leading to excessive exploration of uncertain regions that may not yield useful information. To address this, we propose the Confidence-Adjusted Surprise Measure for Active Resourceful Trials (CA-SMART), a novel Bayesian active learning framework tailored for optimizing data-driven experimentation. On a high level, CA-SMART incorporates Confidence-Adjusted Surprise (CAS) to dynamically balance exploration and exploitation by amplifying surprises in regions where the model is more certain while discounting them in highly uncertain areas. We evaluated CA-SMART on two benchmark functions (Six-Hump Camelback and Griewank) and in predicting the fatigue strength of steel. The results demonstrate superior accuracy and efficiency compared to traditional surprise metrics, standard Bayesian Optimization (BO) acquisition functions and conventional ML methods.

Paper Structure

This paper contains 28 sections, 18 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Active learning process in material discovery
  • Figure 2: Bayesian Optimization process over multiple iterations
  • Figure 3: Sequential approximation of the 1D function using the CA-SMART framework. The figure illustrates the progression from initial samples (Iteration 0) to the final approximation (Iteration 11), highlighting the model's dynamic switching between exploration and exploitation driven by the CAS metric. Exploration points (yellow and purple triangles) and exploitation samples (blue and green triangles) refine the model incrementally.
  • Figure 4: Performance on the Six-Hump Camelback function: (a) RMSE across iterations, (b) CRPS across iterations. Shaded areas represent 95% confidence intervals.
  • Figure 5: Performance on the Griewank function: (a) RMSE across iterations, (b) CRPS across iterations. Shaded areas represent 95% confidence intervals.
  • ...and 6 more figures