Table of Contents
Fetching ...

Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization

Marcus Schwarting, Logan Ward, Nathaniel Hudson, Xiaoli Yan, Ben Blaiszik, Santanu Chaudhuri, Eliu Huerta, Ian Foster

TL;DR

This work tackles the inefficiency and potential degradation of generative AI in inverse design by coupling a queue prioritization scheme with an active-learning surrogate to steer candidate selection. The approach retrains a surrogate predictor on new simulation data and uses an acquisition function to reorder the generation queue, promoting high-quality, novel MOF candidates while discarding implausible ones. Applied to MOF-based carbon capture, the method increases high-performing candidates from 281 to 604 out of 1000 generated, with only marginal compute overhead (≈0.6%). The results demonstrate improved exploration–exploitation balance and offer a practical, generalizable strategy to boost discovery in complex design spaces while mitigating model decay. This has direct implications for accelerating materials discovery in high-throughput screening pipelines where expensive evaluations constrain exploration.

Abstract

Generative AI poses both opportunities and risks for solving inverse design problems in the sciences. Generative tools provide the ability to expand and refine a search space autonomously, but do so at the cost of exploring low-quality regions until sufficiently fine tuned. Here, we propose a queue prioritization algorithm that combines generative modeling and active learning in the context of a distributed workflow for exploring complex design spaces. We find that incorporating an active learning model to prioritize top design candidates can prevent a generative AI workflow from expending resources on nonsensical candidates and halt potential generative model decay. For an existing generative AI workflow for discovering novel molecular structure candidates for carbon capture, our active learning approach significantly increases the number of high-quality candidates identified by the generative model. We find that, out of 1000 novel candidates, our workflow without active learning can generate an average of 281 high-performing candidates, while our proposed prioritization with active learning can generate an average 604 high-performing candidates.

Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization

TL;DR

This work tackles the inefficiency and potential degradation of generative AI in inverse design by coupling a queue prioritization scheme with an active-learning surrogate to steer candidate selection. The approach retrains a surrogate predictor on new simulation data and uses an acquisition function to reorder the generation queue, promoting high-quality, novel MOF candidates while discarding implausible ones. Applied to MOF-based carbon capture, the method increases high-performing candidates from 281 to 604 out of 1000 generated, with only marginal compute overhead (≈0.6%). The results demonstrate improved exploration–exploitation balance and offer a practical, generalizable strategy to boost discovery in complex design spaces while mitigating model decay. This has direct implications for accelerating materials discovery in high-throughput screening pipelines where expensive evaluations constrain exploration.

Abstract

Generative AI poses both opportunities and risks for solving inverse design problems in the sciences. Generative tools provide the ability to expand and refine a search space autonomously, but do so at the cost of exploring low-quality regions until sufficiently fine tuned. Here, we propose a queue prioritization algorithm that combines generative modeling and active learning in the context of a distributed workflow for exploring complex design spaces. We find that incorporating an active learning model to prioritize top design candidates can prevent a generative AI workflow from expending resources on nonsensical candidates and halt potential generative model decay. For an existing generative AI workflow for discovering novel molecular structure candidates for carbon capture, our active learning approach significantly increases the number of high-quality candidates identified by the generative model. We find that, out of 1000 novel candidates, our workflow without active learning can generate an average of 281 high-performing candidates, while our proposed prioritization with active learning can generate an average 604 high-performing candidates.

Paper Structure

This paper contains 29 sections, 2 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Flow chart of a generative model workflow with AL for queue prioritization.
  • Figure 2: Example of MOF structure and diagram of MOF screening workflow with the AL model iteratively reordering the updated linker queue. Components outlined in red-dashed boxes incorporate MOF-specific implementation details.
  • Figure 3: Reordering tests with AL driven by exploration/exploitation, and mixed explore/exploit objectives (compared to a random baseline). (a) RMSE of a linker hold-out set with different queue ordering priorities. (b) Total # of identified stable MOFs with different queue ordering priorities.
  • Figure 4: Workflow runs with and without AL, shown with varying DiffLinker fine-tuning on a percentage (10%, 50%, and 90%) of the most stable linkers identified. (a) Total count of novel stable MOFs during various workflow runs. (b) Cumulative proportion of MOFs with corresponding stabilities across various workflow runs.
  • Figure 5: Queue reordering AL tests for alternative acquisition functions with emphasis on SAScore $S_{SA}$. (a) The averaged $S_{SA}$ over the simulations. (b) The total number of discovered stable MOFs over the simulations.
  • ...and 2 more figures