Table of Contents
Fetching ...

The Reward Function and the Least Cost Principle for Gravitation and other Laws of Physics

Rubén Moreno-Bote

Abstract

If the universe follows a specific design, then a central question is which cost function is optimized by the observed forces. This is the problem of inverse optimal control, or inverse reinforcement learning, in which a reward function is inferred from the dynamics of the observed system. We first establish the {\em least cost principle}, whereby the laws of motion can be derived from minimization of a time-discounted integral of the acceleration cost minus a state-dependent reward function. After determining the functional form of the acceleration cost from basic principles, we infer the reward function from the laws of motion governing classical gravitation and Coulomb forces. The inferred reward function is high when pairs of particles have high relative velocities and when their relative motion is orthogonal to their distance vectors. All in all, our work suggests that relative motion and quasi-circular orbits are the dynamical and static features optimized by central forces in nature.

The Reward Function and the Least Cost Principle for Gravitation and other Laws of Physics

Abstract

If the universe follows a specific design, then a central question is which cost function is optimized by the observed forces. This is the problem of inverse optimal control, or inverse reinforcement learning, in which a reward function is inferred from the dynamics of the observed system. We first establish the {\em least cost principle}, whereby the laws of motion can be derived from minimization of a time-discounted integral of the acceleration cost minus a state-dependent reward function. After determining the functional form of the acceleration cost from basic principles, we infer the reward function from the laws of motion governing classical gravitation and Coulomb forces. The inferred reward function is high when pairs of particles have high relative velocities and when their relative motion is orthogonal to their distance vectors. All in all, our work suggests that relative motion and quasi-circular orbits are the dynamical and static features optimized by central forces in nature.

Paper Structure

This paper contains 3 sections, 20 equations, 1 figure.

Figures (1)

  • Figure 1: Newtonian gravitation maximizes the reward function in Eq. \ref{['eq:state_cost_gravitation']}. (a) Trajectories of five particles following Newtonian gravitation. (b) Reward contributions from Terms I and II in Eq. \ref{['eq:state_cost_gravitation']} as a function of time. Since reward Term II is non-positive, its absolute value is shown. (c) The minimum cost-to-go in Eq. \ref{['eq:optimal_cumulative_cost']} for each state along the trajectory, plotted as a function of time (black line). This coincides with the actual cost incurrent by the trajectory numerically computed via Eq. \ref{['eq:cost']} (red line; not visible). (d) With an increasing number of particles, $N=10$, the cost-to-go fluctuates more widely but spends more time at relatively small values. (e) A small perturbation of the Newtonian force from $1/r^2$ to $1/r^{2+\epsilon}$ with $\epsilon=0.05$ produces a higher cost-to-go (green line) than the optimal one (black). The cost-to-go is computed at each state along the trajectories generated with the perturbed force. (f) Cost-to-go (black line) incurred by a perturbed Newtonian force $1/r^{2+\epsilon}$ as a function of $\epsilon$, calculated using the same initial condition as in panel (b). The cost-to-go attains a minimum at $\epsilon=0$. Cost-to-go (blue line) incurred by a perturbed Coulomb force $1/r^{2+2\epsilon}$ as a function of $\epsilon$. Parameters are specified in Appendix \ref{['sec:parameters']}.