Table of Contents
Fetching ...

Energy Loss Functions for Physical Systems

Sékou-Oumar Kaba, Kusha Sareen, Daniel Levy, Siamak Ravanbakhsh

TL;DR

The paper introduces energy loss functions derived from a Boltzmann distribution to embed physical priors directly into ML losses for physical systems in thermal equilibrium. By employing a reverse KL divergence, the loss becomes an energy difference around each data point, yielding physically meaningful gradients and symmetry-respecting training while remaining architecture-agnostic. The framework is instantiated for both atomistic (distance-based pair energies) and discrete spin systems, and extended to diffusion-model training. Empirical results on molecule generation and spin ground-state prediction show improved performance and data efficiency over standard losses, with scalable insights from rigidity theory and invariant loss properties.

Abstract

Effectively leveraging prior knowledge of a system's physics is crucial for applications of machine learning to scientific domains. Previous approaches mostly focused on incorporating physical insights at the architectural level. In this paper, we propose a framework to leverage physical information directly into the loss function for prediction and generative modeling tasks on systems like molecules and spins. We derive energy loss functions assuming that each data sample is in thermal equilibrium with respect to an approximate energy landscape. By using the reverse KL divergence with a Boltzmann distribution around the data, we obtain the loss as an energy difference between the data and the model predictions. This perspective also recasts traditional objectives like MSE as energy-based, but with a physically meaningless energy. In contrast, our formulation yields physically grounded loss functions with gradients that better align with valid configurations, while being architecture-agnostic and computationally efficient. The energy loss functions also inherently respect physical symmetries. We demonstrate our approach on molecular generation and spin ground-state prediction and report significant improvements over baselines.

Energy Loss Functions for Physical Systems

TL;DR

The paper introduces energy loss functions derived from a Boltzmann distribution to embed physical priors directly into ML losses for physical systems in thermal equilibrium. By employing a reverse KL divergence, the loss becomes an energy difference around each data point, yielding physically meaningful gradients and symmetry-respecting training while remaining architecture-agnostic. The framework is instantiated for both atomistic (distance-based pair energies) and discrete spin systems, and extended to diffusion-model training. Empirical results on molecule generation and spin ground-state prediction show improved performance and data efficiency over standard losses, with scalable insights from rigidity theory and invariant loss properties.

Abstract

Effectively leveraging prior knowledge of a system's physics is crucial for applications of machine learning to scientific domains. Previous approaches mostly focused on incorporating physical insights at the architectural level. In this paper, we propose a framework to leverage physical information directly into the loss function for prediction and generative modeling tasks on systems like molecules and spins. We derive energy loss functions assuming that each data sample is in thermal equilibrium with respect to an approximate energy landscape. By using the reverse KL divergence with a Boltzmann distribution around the data, we obtain the loss as an energy difference between the data and the model predictions. This perspective also recasts traditional objectives like MSE as energy-based, but with a physically meaningless energy. In contrast, our formulation yields physically grounded loss functions with gradients that better align with valid configurations, while being architecture-agnostic and computationally efficient. The energy loss functions also inherently respect physical symmetries. We demonstrate our approach on molecular generation and spin ground-state prediction and report significant improvements over baselines.

Paper Structure

This paper contains 62 sections, 4 theorems, 62 equations, 5 figures, 12 tables.

Key Result

Proposition 4.2

The loss function eq:loss is invariant to the group

Figures (5)

  • Figure 1: Energy interpretation of loss functions. Ground truth positions are denoted in green and predictions in blue. (a) The MSE loss function for particle positions corresponds to quadratic potential energy centered on the data. (b) This choice is however physically unsound and leads to penalizing the model for configurations that are correct, i.e. related by rigid motion to the target. (c) A more accurate choice would be to use a loss function based on physically sound energy, which would not suffer from the aforementioned problem.
  • Figure 2: Loss landscapes. The model has to predict the positions of two particles in one dimension. The prediction for the first particle $\hat{\mathbf{y}}_0$ is closer to the ground-truth for the second particle $\mathbf{y}_1$ and vice-versa. (a) The MSE minimizes the forward KL divergence between a Gaussian model distribution (blue) and the data distribution (green). It does not capture the symmetry. (b) The energy loss is obtained via the reverse KL with the pair energy and admits a family of minimizers associated with symmetries. It results in a gradient that points towards the closest correct configuration.
  • Figure 3: Regular shape prediction results. (a) Typical samples from optimal models trained with MSE and energy loss when $\theta_{aug} = \pi$. (b) The impact of $\theta_{aug}$ on sample quality. We can see as $\theta_{aug}$ increases, MSE performance deteriorates but the invariant losses (Energy and Kabsch Align) remain performant. (c) As the number of shape vertices scales, a sparse version of the energy loss remains equally performant as a complete-edge energy loss using only $O(N)$ operations.
  • Figure 4: Molecule generation results. (Left) We observe a dramatic improvement on stability metrics for the GEOM-Drugs dataset, demonstrating the scalability of our approach. (Right) On QM9, energy loss improves metrics over all baselines. This is especially present in the low data regime where energy loss gives +$10\%$ molecule stability over MSE.
  • Figure 5: Global rigidity testing of random $k$-regular graphs. Here, $n$ denotes the number of vertices and $d$ the dimension.

Theorems & Definitions (9)

  • Definition 4.1: Invariant loss function
  • Proposition 4.2
  • Corollary 4.3
  • Proposition 4.4
  • Proposition 4.5
  • proof
  • proof
  • proof
  • proof