Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

Uyen Tu Lieu; Natsuhiko Yoshinaga

Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

Uyen Tu Lieu, Natsuhiko Yoshinaga

TL;DR

The kinetic pathways governing self-assembly can prevent reaching defect-free DDQC structures even when static design suffices for equilibrium states. The authors employ a Q-learning–based reinforcement learning framework to dynamically control the temperature during Brownian Dynamics simulations of patchy particles, using the sigma fraction as the primary state variable and aiming for a target value (e.g., $\sigma^*=0.91$) that corresponds to a high-quality DDQC. A key finding is the emergence of a characteristic temperature $T^*\approx0.7$, around which structural fluctuations enhance the likelihood of forming the global minimum DDQC, and the learned policy effectively steers the system to $T^*$ and then stabilizes the structure, achieving faster and more defect-free DDQC formation than conventional annealing. The approach extends to unknown or metastable targets (e.g., $\sigma^*=0.65$ or $0.35$) and is supported by a simple triple-well model that clarifies the mechanism, illustrating the potential of dynamic, ML-guided control to design complex self-assembled materials across parameter regimes and system sizes.$

Abstract

We propose reinforcement learning to control the dynamical self-assembly of the dodecagonal quasicrystal (DDQC) from patchy particles. The patchy particles have anisotropic interactions with other particles and form DDQC. However, their structures at steady states are significantly influenced by the kinetic pathways of their structural formation. We estimate the best policy of temperature control trained by the Q-learning method and demonstrate that we can generate DDQC with few defects using the estimated policy. It is found that reinforcement learning autonomously discovers a characteristic temperature at which structural fluctuations enhance the chance of forming a globally stable state. The estimated policy guides the system toward the characteristic temperature to assist the formation of DDQC. We also illustrate the performance of RL when the target is metastable or unstable. To clarify the success of the learning, we analyse a simple model describing the kinetics of structural changes through the motion in a triple-well potential.

Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

TL;DR

) that corresponds to a high-quality DDQC. A key finding is the emergence of a characteristic temperature

, around which structural fluctuations enhance the likelihood of forming the global minimum DDQC, and the learned policy effectively steers the system to

and then stabilizes the structure, achieving faster and more defect-free DDQC formation than conventional annealing. The approach extends to unknown or metastable targets (e.g.,

) and is supported by a simple triple-well model that clarifies the mechanism, illustrating the potential of dynamic, ML-guided control to design complex self-assembled materials across parameter regimes and system sizes.$

Abstract

Paper Structure (14 sections, 4 equations, 12 figures, 1 table)

This paper contains 14 sections, 4 equations, 12 figures, 1 table.

Introduction
Methods
Reinforcement learning for dynamic self-assembly
Self-assembly of patchy particles through Brownian Dynamics simulations
Characterisation of DDQC structures
Target structures for RL
Results
Optimal temperature change to generate DDQC from patchy particles
Training process
Testing evaluation
Comparison of RL with conventional approaches
Reinforcement learning for unknown targets of patchy particles
RL, equilibrium phases, and metastability
Discussion and conclusion

Figures (12)

Figure 1: Schematic of reinforcement learning for dynamic self-assembly. The agent observes the state $s$ from the environment, and decides to take an action $a$ based on the policy $\pi$. The agent learns the policy $\pi$ by a training process to optimise the rewards $r$. In this study, the environment is the particle configuration under a given temperature. The observed states $s$ are the ratio of sigma particle $\sigma$ and the temperature $T$. The action $a$ is to decrease, maintain, or increase the current temperature.
Figure 2: Schematic of Q-learning at each epoch with $\epsilon$-greedy method. The action $a$ is chosen based on the current policy $\pi$ and $\epsilon$. Q is updated according to eq. \ref{['eq:Qupdate']}. Brownian dynamics (BD) simulation is conducted for every action step in $N_\text{step}$ of each epoch.
Figure 3: Characterisation of DDQCs. (a) Demonstration of local structures. (b-c) Examples of DDQC with few and many defects, and the correspondent Fourier transformations. The undefined particles ($U$) are marked purple. The fraction of sigma in (b-c) are 0.84 and 0.67, respectively. (d) Dodecagonal motif made from one $Z$ particle centred in 18 $\sigma$ particles.
Figure 4: Training data at the condition of random $T_0$ and number of epochs $N_\text{e}=101$ in Table \ref{['table:parameters']}. (a,b) The progression of the states $T$ and $\sigma$ at selected epochs: first, middle and last epoch (equivalent $\epsilon=0, 0.5, 1$ respectively); (c) the policy after training; (d) the change of ratio of the number of accessed states to total states and (e) ratio of flipped-policy states to accessed states after each epoch during training, the horizontal axis on the top of the graph is the corresponding value of $\epsilon$.
Figure 5: Testing data of the policy obtained at the condition of random $T_0$ and number of epochs $N_\text{e}=101$ in Fig. \ref{['fig:allT0Ne101.training']}. Samples starting with low (blue), intermediate (red), high (yellow) initial temperature are shown with (a) the temperature schedule, (b) corresponding $\sigma$, and (c) snapshots at the last step of the corresponding trajectories. (d) The trajectories of (a,b) on the policy plane obtained from Fig. \ref{['fig:allT0Ne101.training']}(c), in which the starting points are from the left side. Changes of temperatures of the trajectories follow the policy shown in the background. (e) The dependence of $\sigma$ on the initial temperature $T_0$ obtained from 20 independent samples. The dashed line is a guide to the eye for the lower limit of global minimum DDQCs.
...and 7 more figures

Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

TL;DR

Abstract

Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)