Exploring the flavor structure of quarks and leptons with reinforcement learning

Satsuki Nishimura; Coh Miyao; Hajime Otsuka

Exploring the flavor structure of quarks and leptons with reinforcement learning

Satsuki Nishimura, Coh Miyao, Hajime Otsuka

TL;DR

This work tackles the flavor puzzle by applying a value-based reinforcement learning approach to Froggatt--Nielsen models with a $U(1)$ flavor symmetry. Using a Deep Q-network, the agent searches over 19-dimensional $U(1)$ charge configurations to reproduce quark and lepton masses and mixings, treating the flavon-induced parameter $\epsilon = v_\phi/M$ as the driver of Yukawa hierarchies. The results show the agent identifies 21 realistic quark-charge patterns and consistently favors normal ordering for neutrino masses, with predicted $m_{\beta\beta}$ in the meV range and nonzero Majorana phases arising from flavon dynamics. This demonstrates that reinforcement learning can be a powerful, model-agnostic tool to explore flavor-model spaces and motivate extensions to SMEFT and flavon CP phenomena.

Abstract

We propose a method to explore the flavor structure of quarks and leptons with reinforcement learning. As a concrete model, we utilize a basic value-based algorithm for models with $U(1)$ flavor symmetry. By training neural networks on the $U(1)$ charges of quarks and leptons, the agent finds 21 models to be consistent with experimentally measured masses and mixing angles of quarks and leptons. In particular, an intrinsic value of normal ordering tends to be larger than that of inverted ordering, and the normal ordering is well fitted with the current experimental data in contrast to the inverted ordering. A specific value of effective mass for the neutrinoless double beta decay and a sizable leptonic CP violation induced by an angular component of flavon field are predicted by autonomous behavior of the agent. Our finding results indicate that the reinforcement learning can be a new method for understanding the flavor structure.

Exploring the flavor structure of quarks and leptons with reinforcement learning

TL;DR

This work tackles the flavor puzzle by applying a value-based reinforcement learning approach to Froggatt--Nielsen models with a

flavor symmetry. Using a Deep Q-network, the agent searches over 19-dimensional

charge configurations to reproduce quark and lepton masses and mixings, treating the flavon-induced parameter

as the driver of Yukawa hierarchies. The results show the agent identifies 21 realistic quark-charge patterns and consistently favors normal ordering for neutrino masses, with predicted

in the meV range and nonzero Majorana phases arising from flavon dynamics. This demonstrates that reinforcement learning can be a powerful, model-agnostic tool to explore flavor-model spaces and motivate extensions to SMEFT and flavon CP phenomena.

Abstract

We propose a method to explore the flavor structure of quarks and leptons with reinforcement learning. As a concrete model, we utilize a basic value-based algorithm for models with

flavor symmetry. By training neural networks on the

charges of quarks and leptons, the agent finds 21 models to be consistent with experimentally measured masses and mixing angles of quarks and leptons. In particular, an intrinsic value of normal ordering tends to be larger than that of inverted ordering, and the normal ordering is well fitted with the current experimental data in contrast to the inverted ordering. A specific value of effective mass for the neutrinoless double beta decay and a sizable leptonic CP violation induced by an angular component of flavon field are predicted by autonomous behavior of the agent. Our finding results indicate that the reinforcement learning can be a new method for understanding the flavor structure.

Paper Structure (16 sections, 27 equations, 12 figures, 7 tables)

This paper contains 16 sections, 27 equations, 12 figures, 7 tables.

Introduction
Reinforcement learning with deep Q-network
Froggatt-Nielsen model with reinforcement learning
The environment
Neural Network
Agent
Learning the quark sector
Learning the neutrino structure
Fixed ordering of neutrino masses
Unfixed ordering of neutrino masses
Conclusion
Formulation of reinforcement learning
FN charges
Quark sector
Lepton sector (RL with NO designated)
...and 1 more sections

Figures (12)

Figure 6: Distribution of ${\cal O}(1)$ coefficients in Yukawa terms (\ref{['eq:Lagrangian']}).
Figure 7: Learning results for the quark sector. The results are the output of neural network leading to the best-fit model shown in Table \ref{['tab:benchmark_quark']}. From left to right, three panels show (a) the loss function vs episode number (b) the fraction of terminal episodes vs episode number (c) the number of terminal states vs episode number, respectively.
Figure 8: Learning results for the lepton sector with fixed NO of neutrino masses. The results are the output of neural network leading to the best-fit model (the square in Figs. \ref{['fig:data_lepton_fixed_NO1']} and \ref{['fig:data_lepton_fixed_NO2']}). From left to right, three panels show (a) the loss function vs episode number (b) the fraction of terminal episodes vs episode number (c) the number of terminal states vs episode number, respectively.
Figure 9: Neutrino masses vs mixing angle $\theta_{23}$, where the dotted line represents the global best fit value in NuFIT v5.2 results with Super-Kamiokande atmospheric data Esteban:2020cvm, and the inside region of each line represents dashed line $\leq 1\sigma$, dotdashed line $\leq 3\sigma$ CL, respectively. The sum of neutrino masses is constrained by $0.15$ eV (95% CL) corresponding to the black solid line in the case of $\Lambda$CDM model RoyChoudhury:2019hls. We denote a best-fit point within $3\sigma$ by a square, and the intrinsic value \ref{['eq:intrinsic_value']} is written in the legend. Note that the neutrino mass ordering is fixed as NO in the training of the neural network.
Figure 10: Majorana phases $\alpha_{21}, \alpha_{31}$ and effective Majorana neutrino mass $m_{\beta\beta}$ vs mixing angle $\theta_{23}$, where the dotted line represents the global best fit value in NuFIT v5.2 results with Super-Kamiokande atmospheric data Esteban:2020cvm, and the inside region of each line represents dashed line $\leq 1\sigma$, dotdashed line $\leq 3\sigma$ CL, respectively. The effective Majorana neutrino mass is upper bounded by $0.036$ eV (90% CL) corresponding to the black solid line KamLAND-Zen:2022tow. We denote a best-fit point within $3\sigma$ by a square, and the intrinsic value \ref{['eq:intrinsic_value']} is written in the legend. Note that the neutrino mass ordering is fixed as NO in the training of the neural network.
...and 7 more figures

Exploring the flavor structure of quarks and leptons with reinforcement learning

TL;DR

Abstract

Exploring the flavor structure of quarks and leptons with reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)