Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics

Yoshihiro Michishita

Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics

Yoshihiro Michishita

TL;DR

The paper introduces Alpha Zero for Physics (AZfP), a framework that uses symbolic regression guided by Alpha Zero to automatically discover analytical transformations in physics. By representing equations as trees and applying MCTS with neural-network guidance, AZfP can search for physically meaningful symbolic forms and optimize them via a physics-informed reward. Demonstrated on periodically driven (Floquet) systems, AZfP derives first- to third-order Floquet-Magnus expansions for a two-spin model, outperforming standard RL baselines in efficiency and accuracy. This approach offers a path toward automated discovery of analytical methods and effective models, potentially enabling new theoretical insights and streamlined derivations in diverse physical contexts.

Abstract

Machine learning with neural networks is now becoming a more and more powerful tool for various tasks, such as natural language processing, image recognition, winning the game, and even for the issues of physics. Although there are many studies on the application of machine learning to numerical calculation and assistance of experiments, the methods of applying machine learning to find the analytical method are poorly studied. In this paper, we propose the frameworks of developing analytical methods in physics by using the symbolic regression with the Alpha Zero algorithm, that is Alpha Zero for physics (AZfP). As a demonstration, we show that AZfP can derive the high-frequency expansion in the Floquet systems. AZfP may have the possibility of developing a new theoretical framework in physics.

Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics

TL;DR

Abstract

Paper Structure (10 sections, 6 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 10 sections, 6 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Tree representation of equations and the game of its construction
Algorithms of Alpha Zero for Physics
Application of Alpha Zero for Physics to periodically-driven systems
Application of Alpha Zero for Physics to other physical problem.
Conclusion and Remarks
Algorithm of Alpha Zero for Physics
Architecture of the neural networks and its learning
Comparison with other reinforcement learning techniques
Search for the longer equation

Figures (6)

Figure 1: (a)An example of the tree construction and its procedures. (b) An example of the forbidden tree.
Figure 2: The picture of the five step procedure of Alpha Zero for Physics. Step1: Input the state vector to the neural network; Step2: We choose the action by Eq.(\ref{['ucb']}) in the simulation of PUCT search or choose the most-simulated action in the self-play; Step3: Update the state and the equation; Step4: If the equation is completed, calculate the score of the equation; Step5: Update the PUCT statistics in the simulation or train the neural network using the self-play statistics.
Figure 3: vector representation of equation state as an input for the case where the action space $\mathcal{A}$ consists of $\mathrm{exp}, A, B$ and the maximum turn is 3.
Figure 4: Searching dynamics by AZfP when we set $T_\mathrm{max}=14$. The y-axis represents the max score found in the search, and the x-axis represents the iteration number of calculated scores in the search. Because the search includes randomness, we have checked in 20 trials. The blue lines represent the performance of each trial, and the red line represents their average. AZfP found the symbolic representation of the unitary transformation corresponding to the first-, second-, and third-order Floque-Magnus expansion in order and also found some unphysical rotating frames with good scores. We set the model parameters as follows: $\Omega = 10, \xi = 0.4, J_z = 1.0, J_x = 0.7, h_z=0.5$.
Figure S1: The scores via evaluation iteration by the $\epsilon$-greedy algorithm and the actor-critic method with PPO. We have calculated 20 trials by each methods. The blue lines show each score dynamics, the red line shows the average of them, and the grey line shows the average performance of AZfP.
...and 1 more figures

Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics

TL;DR

Abstract

Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics

Authors

TL;DR

Abstract

Table of Contents

Figures (6)