Table of Contents
Fetching ...

CuPyMag: GPU-Accelerated Finite-Element Micromagnetics with Magnetostriction

Hongyi Guan, Ananya Renuka Balakrishna

TL;DR

CuPyMag addresses the challenge of performing large-scale micromagnetic simulations with magnetoelastic coupling on realistic geometries by delivering a GPU-resident, tensorized FEM framework in Python. It combines magnetization, demagnetization, and elastic field solving with the ellipsoid theorem and Gauss-Seidel projection method to enable stable time integration on unstructured meshes. The approach yields substantial performance gains, resolving up to millions of nodes on a single H200 GPU in hours and offering orders of magnitude speedup over CPUs. The work provides an accessible, extensible, and physics-rich tool that broadens the capability to study defect-influenced micromagnetics and magnetoelastic effects in realistic materials.

Abstract

We introduce CuPyMag, an open-source, Python-based framework for large-scale micromagnetic simulations with magnetostriction. CuPyMag solves micromagnetics with finite elements in a GPU-resident workflow in which key operations, such as right-hand-side assembly, spatial derivatives, and volume averages, are tensorized using CuPy's BLAS-accelerated backend. Benchmark tests show that the GPU solvers in CuPyMag achieve a speedup of up to two orders of magnitude compared to the CPU codes. Its runtime grows linearly/sublinearly with problem size, demonstrating high efficiency. Additionally, CuPyMag uses the Gauss-Seidel projection method for time integration, which not only allows stable time steps (up to 11 ps) but also solves each governing equation with only 1-3 conjugate-gradient iterations without preconditioning. CuPyMag accounts for magnetoelastic coupling and far-field effects arising from the boundary of the magnetic body, both of which play an important role in magnetization reversal in the presence of local defects. CuPyMag solves these computationally intensive multiphysics simulations with a high-resolution mesh (up to 3M nodes) in under three hours on an NVIDIA H200 GPU. This acceleration enables micromagnetic simulations with non-trivial defect geometries and resolves nanoscale magnetic structures. It expands the scope of micromagnetic simulations towards realistic, large-scale problems that can guide experiments. More broadly, CuPyMag is developed using widely adopted Python libraries, which provide cross-platform compatibility, ease of installation, and accessibility for adaptations to diverse applications.

CuPyMag: GPU-Accelerated Finite-Element Micromagnetics with Magnetostriction

TL;DR

CuPyMag addresses the challenge of performing large-scale micromagnetic simulations with magnetoelastic coupling on realistic geometries by delivering a GPU-resident, tensorized FEM framework in Python. It combines magnetization, demagnetization, and elastic field solving with the ellipsoid theorem and Gauss-Seidel projection method to enable stable time integration on unstructured meshes. The approach yields substantial performance gains, resolving up to millions of nodes on a single H200 GPU in hours and offering orders of magnitude speedup over CPUs. The work provides an accessible, extensible, and physics-rich tool that broadens the capability to study defect-influenced micromagnetics and magnetoelastic effects in realistic materials.

Abstract

We introduce CuPyMag, an open-source, Python-based framework for large-scale micromagnetic simulations with magnetostriction. CuPyMag solves micromagnetics with finite elements in a GPU-resident workflow in which key operations, such as right-hand-side assembly, spatial derivatives, and volume averages, are tensorized using CuPy's BLAS-accelerated backend. Benchmark tests show that the GPU solvers in CuPyMag achieve a speedup of up to two orders of magnitude compared to the CPU codes. Its runtime grows linearly/sublinearly with problem size, demonstrating high efficiency. Additionally, CuPyMag uses the Gauss-Seidel projection method for time integration, which not only allows stable time steps (up to 11 ps) but also solves each governing equation with only 1-3 conjugate-gradient iterations without preconditioning. CuPyMag accounts for magnetoelastic coupling and far-field effects arising from the boundary of the magnetic body, both of which play an important role in magnetization reversal in the presence of local defects. CuPyMag solves these computationally intensive multiphysics simulations with a high-resolution mesh (up to 3M nodes) in under three hours on an NVIDIA H200 GPU. This acceleration enables micromagnetic simulations with non-trivial defect geometries and resolves nanoscale magnetic structures. It expands the scope of micromagnetic simulations towards realistic, large-scale problems that can guide experiments. More broadly, CuPyMag is developed using widely adopted Python libraries, which provide cross-platform compatibility, ease of installation, and accessibility for adaptations to diverse applications.

Paper Structure

This paper contains 20 sections, 23 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: The complete workflow of CuPyMag. Here, the average magnetization, the external fields $\mathbf{H}_{\mathrm{ext}}$, and field maps, including the magnetization field $\mathbf{m}$, demagnetization field $\mathbf{H}_\mathrm{d}$, and strain field $E$, are written either at convergence or after every $N$ steps. The dashed borders indicate the stages that involve CPU–GPU data transfer. These stages are minimum in the workflow highlighting our effort to minimize CPU-GPU data transfer overhead.
  • Figure 2: Benchmark of system assembly and linear solver performance for a linear hexahedral element system. (a) Assembly time for the magnetostatic equilibrium system, the Gauss–Seidel projection method (GSPM) system, and the mechanical equilibrium system (represented in a distinct color scheme, see online version). (b–d) Conjugate gradient (CG) solve time for these three systems with different precisions and backend implementations. For each system, we test the solve time using CuPy in double precision (DP), CuPy in single precision (SP), and using PETSc on CPUs (in DP and via petsc4pyDALCIN20111124osti_2565610). We do not use a preconditioner in any of the CG solvers.
  • Figure 3: Benchmark for an example micromagnetic calculation with 550k nodes on a linear hexahedral mesh. The computational domain contains a non-magnetic defect at its geometric center and is initialized with uniform magnetization. The LLG equations are solved by the Gauss-Seidel projection method (GSPM) iteratively until convergence, with LLG time steps ranging from 2 ps to 11 ps. (a) The total number of LLG steps required to converge for each calculation. (b-d) The average number of CG iterations for solving the magnetostatic equilibrium, GSPM, and mechanical equilibrium systems. These are grouped with a consistent color family to highlight their conceptual relation. All calculations are performed in double precision.
  • Figure 4: Total runtime benchmark for the full FEM micromagnetics simulation of the hysteresis process of Ni$_{70}$Fe$_{30}$, from $\vb{H}_{\mathrm{ext}}=200~\mathrm{A/m}~\vb{\hat{e}}_1$ to $\vb{H}_{\mathrm{ext}}=-|\vb{H}_{\mathrm{c}}|~\vb{\hat{e}}_1$. (a) Log–log plot of total runtime versus number of FEM nodes, comparing Delta’s H200 node (gray markers) and A100 node (blue markers), with squares denoting system assembly time and triangles denoting the subsequent simulations at all the LLG timesteps. (b) Projection of the sample domain onto the $x-y$ plane. The blue region is the magnetic region, the red ellipse is the nonmagnetic defect, and the white box indicates the subregion used for mesh visualization. (c-h) Linear tetrahedral FEM mesh visualization within the white box denoted in (b) for six increasing mesh densities. All the calculations are performed in double precision.
  • Figure 5: The evolution of needle and stripe-shaped magnetic domains (as 2D projection) around a cuboid, sphere, or ellipsoid-shaped defect, at the points a, b, and c on the hysteresis loop at right. Note that because the $x$‑axis is a normalized field scale, the loops are not identical absolute-field responses but are shown here to highlight the characteristic hysteresis behavior for each defect. The color map shows the magnetization along the $\hat{\vb{e}}_1$ direction: $m_1=\vb{m}\cdot\hat{\vb{e}}_1$.
  • ...and 3 more figures