Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning
Uyen Tu Lieu, Natsuhiko Yoshinaga
TL;DR
The kinetic pathways governing self-assembly can prevent reaching defect-free DDQC structures even when static design suffices for equilibrium states. The authors employ a Q-learning–based reinforcement learning framework to dynamically control the temperature during Brownian Dynamics simulations of patchy particles, using the sigma fraction as the primary state variable and aiming for a target value (e.g., $\sigma^*=0.91$) that corresponds to a high-quality DDQC. A key finding is the emergence of a characteristic temperature $T^*\approx0.7$, around which structural fluctuations enhance the likelihood of forming the global minimum DDQC, and the learned policy effectively steers the system to $T^*$ and then stabilizes the structure, achieving faster and more defect-free DDQC formation than conventional annealing. The approach extends to unknown or metastable targets (e.g., $\sigma^*=0.65$ or $0.35$) and is supported by a simple triple-well model that clarifies the mechanism, illustrating the potential of dynamic, ML-guided control to design complex self-assembled materials across parameter regimes and system sizes.$
Abstract
We propose reinforcement learning to control the dynamical self-assembly of the dodecagonal quasicrystal (DDQC) from patchy particles. The patchy particles have anisotropic interactions with other particles and form DDQC. However, their structures at steady states are significantly influenced by the kinetic pathways of their structural formation. We estimate the best policy of temperature control trained by the Q-learning method and demonstrate that we can generate DDQC with few defects using the estimated policy. It is found that reinforcement learning autonomously discovers a characteristic temperature at which structural fluctuations enhance the chance of forming a globally stable state. The estimated policy guides the system toward the characteristic temperature to assist the formation of DDQC. We also illustrate the performance of RL when the target is metastable or unstable. To clarify the success of the learning, we analyse a simple model describing the kinetics of structural changes through the motion in a triple-well potential.
