Table of Contents
Fetching ...

Reinforcement Learning for Chemical Ordering in Alloy Nanoparticles

Jonas Elsborg, Arghya Bhowmik

TL;DR

The paper tackles the NP atomic-ordering problem for bimetallic alloys by reframing it as an energy-driven sequential decision process solved with reinforcement learning. It introduces an actor–critic PPO agent with a factorised anchor/partner policy and an equivariant ORB-v3 graph encoder to perform composition-preserving swaps on Mackay-icosahedral Ag$_X$Au$_{309-X}$ nanoparticles, achieving ground-state orderings. The results show strong generalisation across composition and transfer to unseen NP sizes, though multi-element generalisation across chemistries remains challenging. The approach promises transferable priors that amortise search across compositions and sizes, potentially reducing energy-costly searches in catalyst design. The work also discusses limitations and avenues for improvement, including nanoparticle-specific pretraining and more flexible action horizons.

Abstract

We approach the search for optimal element ordering in bimetallic alloy nanoparticles (NPs) as a reinforcement learning (RL) problem, and have built an RL agent that learns to perform such global optimisation using the geometric graph representation of the NPs. To demonstrate the effectiveness, we train an RL agent to perform composition-conserving atomic swap actions on the icosahedral nanoparticle structure. Trained once on randomised $Ag_{X}Au_{309-X}$ compositions and orderings, the agent discovers previously established ground state structure. We show that this optimization is robust to differently ordered initialisations of the same NP compositions. We also demonstrate that a trained policy can extrapolate effectively to NPs of unseen size. However, the efficacy is limited when multiple alloying elements are involved. Our results demonstrate that RL with pre-trained equivariant graph encodings can navigate combinatorial ordering spaces at the nanoparticle scale, and offer a transferable optimisation strategy with the potential to generalise across composition and reduce repeated individual search cost.

Reinforcement Learning for Chemical Ordering in Alloy Nanoparticles

TL;DR

The paper tackles the NP atomic-ordering problem for bimetallic alloys by reframing it as an energy-driven sequential decision process solved with reinforcement learning. It introduces an actor–critic PPO agent with a factorised anchor/partner policy and an equivariant ORB-v3 graph encoder to perform composition-preserving swaps on Mackay-icosahedral AgAu nanoparticles, achieving ground-state orderings. The results show strong generalisation across composition and transfer to unseen NP sizes, though multi-element generalisation across chemistries remains challenging. The approach promises transferable priors that amortise search across compositions and sizes, potentially reducing energy-costly searches in catalyst design. The work also discusses limitations and avenues for improvement, including nanoparticle-specific pretraining and more flexible action horizons.

Abstract

We approach the search for optimal element ordering in bimetallic alloy nanoparticles (NPs) as a reinforcement learning (RL) problem, and have built an RL agent that learns to perform such global optimisation using the geometric graph representation of the NPs. To demonstrate the effectiveness, we train an RL agent to perform composition-conserving atomic swap actions on the icosahedral nanoparticle structure. Trained once on randomised compositions and orderings, the agent discovers previously established ground state structure. We show that this optimization is robust to differently ordered initialisations of the same NP compositions. We also demonstrate that a trained policy can extrapolate effectively to NPs of unseen size. However, the efficacy is limited when multiple alloying elements are involved. Our results demonstrate that RL with pre-trained equivariant graph encodings can navigate combinatorial ordering spaces at the nanoparticle scale, and offer a transferable optimisation strategy with the potential to generalise across composition and reduce repeated individual search cost.

Paper Structure

This paper contains 31 sections, 24 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Agent–environment step on the Ag/Au NP: highlight the chosen anchor$i$ and partner$j$, swap species, relax, compute reward, and continue. In Figure \ref{['fig:full_traj_snaps']} in Appendix \ref{['app:additional results']}, we show snapshots from a full episode of anchor-partner swaps for a trained model.
  • Figure 2: PPO implementation and model flow for the proposed composition generalized model for global nanoparticle atomic ordering optimization.
  • Figure 3: Lowest energy structures found by the trained agent for (a)-(h)the eight test compositions. For the first four (a)-(d) both the internal core structure (left) and outside shell structure (right) are provided.
  • Figure 4: Elemental radial distribution function (E-RDF) plots and energies for eight initialised and final structures of icosahedral Ag205Au104. The panels (a)-(h) are ordered from lowest to highest.
  • Figure 5: For all eight 309 atom test systems (ico-1 to ico-8), the summary of energies is shown as $\Delta E$ relative to the average energy of the final optimised structures from eight runs with policy from experiment 1(dashed line). For each system, we depict the mean energy of the randomly initialised structure; the mean, maximum and minimum energies after optimisation of the structures from all runs. The dashed line at 0 denotes the baseline mean optimised structure energy from experiment 1. Optimisation runs from experiment 2 remain consistent and close to the results obtained in experiment 1. Experiment 3 results show that the policy trained with NPs of multiple element combinations fails to resolve the lowest energy structures in the size extrapolation situation. e.g. for ico-5 and ico-6 the the outcomes from different optimisation runs are consistently more than 0.3eV higher in energy than structures found in experiment 1.
  • ...and 2 more figures