Optimized control protocols for stable skyrmion creation using deep reinforcement learning

Ji Seok Song; Se Kwon Kim; Kyoung-Min Kim

Optimized control protocols for stable skyrmion creation using deep reinforcement learning

Ji Seok Song, Se Kwon Kim, Kyoung-Min Kim

Abstract

Generating stable magnetic skyrmions is essential for the practical application of skyrmion-based spintronic devices in thermally agitating environments. Recent advancements have enabled the creation of skyrmions by controlling stripe domain instability through dynamic magnetic-field control. However, deterministic skyrmion creation and effectively managing the thermal stability of skyrmions remain challenges. Here, we present a deep reinforcement learning (DRL) approach to identify advanced dynamic magnetic-field-temperature paths that create skyrmions while controlling stripe domain instability and enhancing their thermal stability. The trained DRL agent discovers an optimized field-temperature path that achieves a higher success rate for skyrmion formation in Fe3GeTe2 monolayers compared to previous fixed-temperature field sweeps. Additionally, the generated skyrmions exhibit longer lifetimes due to their isotropic shape, which tends to suppress internal excitation modes associated with skyrmion annihilation. We demonstrate that these advancements stem from the targeted minimization of the dissipated work, which ensures that the driven skyrmion states remain close to their equilibrium distributions by upper-bounding the Kullback-Leibler divergence. Our findings suggest that a DRL-powered search streamlines the identification of optimized protocols for skyrmion creation and control.

Optimized control protocols for stable skyrmion creation using deep reinforcement learning

Abstract

Paper Structure (3 sections, 5 equations, 4 figures)

This paper contains 3 sections, 5 equations, 4 figures.

Introduction
Results
Discussion

Figures (4)

Figure 1: Schematic of the DRL framework for identifying optimized skyrmion-creation protocols. The trained DRL agent generates a time-dependent protocol for the external magnetic fields and temperature (indicated by the yellow arrow). This control drives the system from an initial stripe domain state to an isolated skyrmion state. By minimizing dissipation, the protocol yields a fully relaxed, circular skyrmion positioned near a local energy minimum. This energetic stability effectively reduces the probability of the skyrmion collapsing into the ferromagnetic ground state during post-protocol relaxation (indicated by the gray arrow), thereby ensuring enhanced longevity. In contrast, the conventional protocol (indicated by the magenta arrow) induces high dissipation, resulting in elliptical skyrmions with significant internal excitations. Such states reside far from the local energy minimum, exhibiting reduced energetic stability and significantly shorter lifetimes.
Figure 2: Comparison of control protocols and corresponding magnetization dynamics in Fe3GeTe2 (FGT). The figure is organized into three columns separated by vertical lines, representing: a conventional field-sweep protocol (left), a protocol generated by a 50th generation DRL agent (center), and a protocol from a 200th generation DRL agent (right). (a)--(c) Time-evolution of the control parameters, including temperature ($T$) and magnetic field components ($\mu_0 H_x, \mu_0H_z$), for the respective protocols. Dashed lines represent sinusoidal fitting functions with the fitted parameters: $A_x = -3.48\mathrm{T}$, $A_z = 3.09\mathrm{T}$, $A_T = 198.26\mathrm{K}$, and $\tau_f = 100~\mathrm{ps}$. (d)--(f) Corresponding time-dependence of the Hamiltonian parameters for FGT, specifically the uniaxial anisotropy ($K$) and saturation magnetization ($M_s$). (g)--(j) Representative snapshots of a spin configurations evolving under each protocol. Local arrows denote the in-plane magnetization direction for each spin site. Timestamps for each snapshot are provided in each configuration. The green circle denotes a skyrmion ($Q = -1$), while the yellow circle denotes an antiskyrmion ($Q = +1$).
Figure 3: Enhanced nucleation efficiency and thermal stability of skyrmions via DRL. (a) Evolution of the two terms in the cost function $\phi$ over training epochs from generation 0 to 200. The first term represents $||Q_f|-1|$, while the second term corresponds to $k_s W_f/(k_B T_f)$; both terms are averaged over 10 independent trajectories. For visual clarity, the second term is scaled by a factor of $10^3$ to bring both contributions into a comparable numerical range. (b) Success rate of skyrmion formation for the conventional field-sweep protocol and for protocols generated at different DRL generations. (c) Time evolution of the cumulative entropy production $\Delta S_{\mathrm{res}}/k_B$ for each protocol. Solid lines denote the mean cumulative entropy production averaged over trajectories, and shaded regions represent the standard deviation ($\pm 1\sigma$) from the mean. (d)–(e) Probability density functions of the magnetic energy density $\mathcal{F}$ for instantaneous skyrmion states generated by the 50th and 200th generation DRL protocols, respectively (shaded areas). In each panel, the green lines represent the equilibrium distribution of $\mathcal{F}$ for the skyrmion state. (f) The left axis displays Kullback-Leibler (KL) divergence ($D_{\mathrm{KL}}$) of the 50th, 70th, 100th, 150th, and 200th DRL generations while the right axis shows the corresponding ensemble-averaged magnetic energy density $\langle \mathcal{F} \rangle$ of these instantaneous skyrmion states, together with the equilibrium value $\langle \mathcal{F}_{\mathrm{eq}} \rangle$ (dashed line). Error bars denote the standard error of the mean.
Figure 4: Thermal relaxation dynamics and stability of skyrmions formed by different control protocols. (a)–(c) Snapshots of representative spin configurations during relaxation under the parameter set ($T = 100$ K, $H_x = 0$, $H_z = 0.15$ T). The initial states correspond to the final configurations shown in Fig. \ref{['fig2']}: (a) an antiskyrmion generated by the field-sweep protocol [Fig. \ref{['fig2']}(g)], (b) a skyrmion generated by the 50th-generation DRL protocol [Fig. \ref{['fig2']}(i)], and (c) a skyrmion generated by the 200th-generation DRL protocol [Fig. \ref{['fig2']}(j)]. The time stamp and corresponding ellipticity parameter $\delta$ are indicated in each snapshot. (d) Time evolution of $\delta$ for the trajectories shown in (a)–(c). (e) Time dependence of the skyrmion area $A$ for the trajectories shown in (a)–(c). (f) Survival rate and effective success rate of skyrmions evaluated over a $10$ ns relaxation period.

Optimized control protocols for stable skyrmion creation using deep reinforcement learning

Abstract

Optimized control protocols for stable skyrmion creation using deep reinforcement learning

Authors

Abstract

Table of Contents

Figures (4)