Machine learning models for Si nanoparticle growth in nonthermal plasma
Matt Raymond, Paolo Elvati, Jacob C. Saldinger, Jonathan Lin, Xuetao Shi, Angela Violi
TL;DR
The paper addresses the computational bottleneck of modeling nanoparticle growth in nonthermal plasmas by predicting the sticking probability $P_ ext{st}$ for Si-containing collisions using machine learning trained on classical reactive MD data. It systematically evaluates seven permutation-invariant ML approaches with tailored losses—favoring binomial NLL and its logit formulations—across nested cross-validation schemes to gauge generalization to unseen temperatures, clusters, and impactors. Key findings show that high predictive accuracy can be achieved with only 15–25% of the data, and that DeepSets and LGBM offer strong extrapolation capabilities for unseen temperatures and structures, respectively, with permutation invariance significantly boosting robustness. The approach substantially reduces MD cost for deriving growth parameters in nonthermal plasmas and is readily adaptable to other NP growth contexts, enabling more efficient and realistic reactor-scale simulations.
Abstract
Nanoparticles (NPs) formed in nonthermal plasmas (NTPs) can have unique properties and applications. However, modeling their growth in these environments presents significant challenges due to the non-equilibrium nature of NTPs, making them computationally expensive to describe. In this work, we address the challenges associated with accelerating the estimation of parameters needed for these models. Specifically, we explore how different machine learning models can be tailored to improve prediction outcomes. We apply these methods to reactive classical molecular dynamics data, which capture the processes associated with colliding silane fragments in NTPs. These reactions exemplify processes where qualitative trends are clear, but their quantification is challenging, hard to generalize, and requires time-consuming simulations. Our results demonstrate that good prediction performance can be achieved when appropriate loss functions are implemented and correct invariances are imposed. While the diversity of molecules used in the training set is critical for accurate prediction, our findings indicate that only a fraction (15-25\%) of the energy and temperature sampling is required to achieve high levels of accuracy. This suggests a substantial reduction in computational effort is possible for similar systems.
