Reinforcement Learning for Chemical Ordering in Alloy Nanoparticles
Jonas Elsborg, Arghya Bhowmik
TL;DR
The paper tackles the NP atomic-ordering problem for bimetallic alloys by reframing it as an energy-driven sequential decision process solved with reinforcement learning. It introduces an actor–critic PPO agent with a factorised anchor/partner policy and an equivariant ORB-v3 graph encoder to perform composition-preserving swaps on Mackay-icosahedral Ag$_X$Au$_{309-X}$ nanoparticles, achieving ground-state orderings. The results show strong generalisation across composition and transfer to unseen NP sizes, though multi-element generalisation across chemistries remains challenging. The approach promises transferable priors that amortise search across compositions and sizes, potentially reducing energy-costly searches in catalyst design. The work also discusses limitations and avenues for improvement, including nanoparticle-specific pretraining and more flexible action horizons.
Abstract
We approach the search for optimal element ordering in bimetallic alloy nanoparticles (NPs) as a reinforcement learning (RL) problem, and have built an RL agent that learns to perform such global optimisation using the geometric graph representation of the NPs. To demonstrate the effectiveness, we train an RL agent to perform composition-conserving atomic swap actions on the icosahedral nanoparticle structure. Trained once on randomised $Ag_{X}Au_{309-X}$ compositions and orderings, the agent discovers previously established ground state structure. We show that this optimization is robust to differently ordered initialisations of the same NP compositions. We also demonstrate that a trained policy can extrapolate effectively to NPs of unseen size. However, the efficacy is limited when multiple alloying elements are involved. Our results demonstrate that RL with pre-trained equivariant graph encodings can navigate combinatorial ordering spaces at the nanoparticle scale, and offer a transferable optimisation strategy with the potential to generalise across composition and reduce repeated individual search cost.
