Table of Contents
Fetching ...

Knowledge Distillation of Noisy Force Labels for Improved Coarse-Grained Force Fields

Feranmi V. Olowookere, Sakib Matin, Aleksandra Pachalieva, Nicholas Lubbers, Emily Shinkle

TL;DR

This work tackles the instability and noise in training coarse-grained force fields arising from mapping AA forces to CG representations and entropic contributions by introducing a knowledge distillation (KD) framework. An ensemble of eight CG teacher models is trained on CG-mapped forces to denoise targets, and their outputs (forces and energies) are distilled into a single, fast-to-infer CG student that preserves ensemble-level accuracy. The approach is validated on a deep eutectic solvent, showing that distilling from an ensemble and supervising with per-bead energies markedly improves two-, three-, and many-body structural metrics (RDF, ADF, CDF) while enabling roughly fivefold faster inference than the teacher ensemble. The results suggest this KD workflow can yield accurate, transferable CG force fields suitable for large-scale simulations and could be extended to more complex materials such as polymers.

Abstract

Molecular dynamics simulations are an integral tool for studying the atomistic behavior of materials under diverse conditions. However, they can be computationally demanding in wall-clock time, especially for large systems, which limits the time and length scales accessible. Coarse-grained (CG) models reduce computational expense by grouping atoms into simplified representations commonly termed beads, but sacrifice atomic detail and introduce mapping noise, complicating the training of machine-learned surrogates. Moreover, because CG models inherently include entropic contributions, they cannot be fit directly to all-atom energies, leaving instantaneous, noisy forces as the only state-specific quantities available for training. Here, we apply a knowledge distillation framework by first training an initial CG neural network potential (the teacher) solely on CG-mapped forces to denoise those labels, then distill its force and energy predictions to train refined CG models (the student) in both single- and ensemble-training setups while exploring different force and energy target combinations. We validate this framework on a complex molecular fluid - a deep eutectic solvent - by evaluating two-, three-, and many-body properties and compare the CG and all-atom results. Our findings demonstrate that training a student model on ensemble teacher-predicted forces and per-bead energies improves the quality and stability of CG force fields.

Knowledge Distillation of Noisy Force Labels for Improved Coarse-Grained Force Fields

TL;DR

This work tackles the instability and noise in training coarse-grained force fields arising from mapping AA forces to CG representations and entropic contributions by introducing a knowledge distillation (KD) framework. An ensemble of eight CG teacher models is trained on CG-mapped forces to denoise targets, and their outputs (forces and energies) are distilled into a single, fast-to-infer CG student that preserves ensemble-level accuracy. The approach is validated on a deep eutectic solvent, showing that distilling from an ensemble and supervising with per-bead energies markedly improves two-, three-, and many-body structural metrics (RDF, ADF, CDF) while enabling roughly fivefold faster inference than the teacher ensemble. The results suggest this KD workflow can yield accurate, transferable CG force fields suitable for large-scale simulations and could be extended to more complex materials such as polymers.

Abstract

Molecular dynamics simulations are an integral tool for studying the atomistic behavior of materials under diverse conditions. However, they can be computationally demanding in wall-clock time, especially for large systems, which limits the time and length scales accessible. Coarse-grained (CG) models reduce computational expense by grouping atoms into simplified representations commonly termed beads, but sacrifice atomic detail and introduce mapping noise, complicating the training of machine-learned surrogates. Moreover, because CG models inherently include entropic contributions, they cannot be fit directly to all-atom energies, leaving instantaneous, noisy forces as the only state-specific quantities available for training. Here, we apply a knowledge distillation framework by first training an initial CG neural network potential (the teacher) solely on CG-mapped forces to denoise those labels, then distill its force and energy predictions to train refined CG models (the student) in both single- and ensemble-training setups while exploring different force and energy target combinations. We validate this framework on a complex molecular fluid - a deep eutectic solvent - by evaluating two-, three-, and many-body properties and compare the CG and all-atom results. Our findings demonstrate that training a student model on ensemble teacher-predicted forces and per-bead energies improves the quality and stability of CG force fields.

Paper Structure

This paper contains 24 sections, 10 equations, 75 figures, 4 tables.

Figures (75)

  • Figure 1: Simulation workflow for training and validating teacher and student models. MD: molecular dynamics, AA: all-atom, CG: coarse-grained. Each molecule is represented by one bead at the coarse-grained level.
  • Figure 2: Distilling knowledge from an ensemble of teacher models into a single student improves both accuracy and efficiency of ML CG models. Each teacher is trained on the same AA force data but with different random seeds; averaging their predictions yields denoised forces and per-bead energies, which are then combined with the original training data to train the student. While the teachers exhibit bias in the RDF compared to the AA reference, the student both reproduces the reference RDF accurately and achieves roughly fivefold faster inference than the teacher-ensemble ($T8$) model.
  • Figure 3: (a) Training metrics MAE, RMSE and $R^2$ of individual teacher models on ground-truth AA force targets, (b) Parity plot of the predicted versus ground-truth AA forces, (c) Urea-Urea RDF of individual teacher models in comparison to AA reference.
  • Figure 4: Comparison of urea-urea (a) RDF TAE and (b) RDF for teacher and student models (using different force targets) relative to the AA reference. Error bars denote one standard deviation over 8 replicas. Teacher ($T$) results were calculated as the mean of eight independent MD simulations, each performed with a teacher model trained from a unique random seed. $S1$ student variants were trained from a single teacher and then run in eight independent replicas; their reported result is the average across all eight student models, with each model trained from one of the individual teachers. $S8$ models were trained on averaged data from all eight teachers. Regarding energy data inclusion, all student models here were trained to only per-bead energies. $F$: ground-truth forces, $\mathcal{f}$: teacher forces, $E$: system energy, $\varepsilon$: per-bead energies.
  • Figure 5: Comparison of Cho-Cl-Urea (a) ADF TAE at different ADF cutoff values $r_{\max}$ and (b) example ADF at $r_{\max}=7.5$ Å at for teacher and student models relative to the AA reference (using different force targets). Error bars denote one standard deviation over 8 replicas. Teacher ($T$) results were calculated as the mean of eight independent MD simulations, each performed with a teacher model trained from a unique random seed. $S1$ student variants were trained from a single teacher and then run in eight independent replicas; their reported result is the average across all eight student models, with each model trained from one of the individual teachers. $S8$ models were trained on averaged data from all eight teachers. Regarding energy data inclusion, all student models here were trained to only per-atom energies. $F$: ground-truth forces, $\mathcal{f}$: teacher forces, $E$: system energy, $\varepsilon$: per-bead energies.
  • ...and 70 more figures