Table of Contents
Fetching ...

Sparse identification of effective microparticle interaction potential in dusty plasma from simulation data

Zachary Brooks Howe, Lorin Swint Matthews, Truell Hyde, Luca Guazzotto, Evdokiya Kostadinova

Abstract

Identification of the particle interaction potential is a challenging and important task in dusty plasma, colloids, and smart materials as it allows the characterization of structure formation and helps predict phase transitions. With the advent of machine learning methods, this interaction can be extracted from particle position data, leading to a generalizable expression which is applicable in different systems. Methods such as sparse regression aim to provide a physically interpretable model that can generalize well, while avoiding unnecessary complexity due to overfitting. In this work, we present the use of the Sparse Identification of Nonlinear Dynamics (SINDy) with the weak formulation to learn equations of motion for noisy data from simple simulations of two dust particles interacting with a Yukawa (shielded Coulomb) potential. The application of these methods to experimental dusty plasma data is discussed, particularly in the case of simulation data and glass box experiments in RF discharge gravity environments and DC discharge microgravity environments, such as the Plasmakristall-4 (PK-4) experiment.

Sparse identification of effective microparticle interaction potential in dusty plasma from simulation data

Abstract

Identification of the particle interaction potential is a challenging and important task in dusty plasma, colloids, and smart materials as it allows the characterization of structure formation and helps predict phase transitions. With the advent of machine learning methods, this interaction can be extracted from particle position data, leading to a generalizable expression which is applicable in different systems. Methods such as sparse regression aim to provide a physically interpretable model that can generalize well, while avoiding unnecessary complexity due to overfitting. In this work, we present the use of the Sparse Identification of Nonlinear Dynamics (SINDy) with the weak formulation to learn equations of motion for noisy data from simple simulations of two dust particles interacting with a Yukawa (shielded Coulomb) potential. The application of these methods to experimental dusty plasma data is discussed, particularly in the case of simulation data and glass box experiments in RF discharge gravity environments and DC discharge microgravity environments, such as the Plasmakristall-4 (PK-4) experiment.
Paper Structure (16 sections, 18 equations, 4 figures, 1 table)

This paper contains 16 sections, 18 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Visual representation of the $k$-fold cross-validation process (with $k=10$) used in the present analysis. The data (200 trajectories) are divided up into two groups: 150 (75%) are used as training data and 50 (25%) are used for testing. The training data are further divided into 10 "folds," one of which is left out for each successive fit. Learned models can then be compared to the true model and evaluated on the test dataset.
  • Figure 2: Coefficient deviation of successive learned SINDy models (Eq. \ref{['eqn:coef_dev']}) plotted as a function of the STLSQ threshold parameter used during fitting. Data used here has a noise level of $0.1 \lambda_{Di}$ and each point is labeled with the number of terms in the model; note that the true model (Eq. \ref{['eqn:1st_order_eom']}) has three terms.
  • Figure 3: Coefficient deviations $\Delta c$ for models determined from the synthetic data using the weak SINDy formulation described in Sec. \ref{['sec:met']}. Ten models were generated at each of the four noise levels tested here, each with a different random sampling of temporal data.
  • Figure 4: The prediction error $\epsilon$ plotted against the coefficient deviation $\Delta c$ of one hundred models generated during cross-validation. All the models shown here use an STLSQ threshold of 0.9, which corresponds to 0.19 $A$. There is no visible correlation between the two metrics $\Delta c$ and $\epsilon$, a linear regression analysis returning an $r^2$ coefficient of $10^{-10}$.