Table of Contents
Fetching ...

Reinforcement Learning techniques for the flavor problem in particle physics

A. Giarnetti, D. Meloni

TL;DR

This review surveys how Reinforcement Learning is applied to the flavor problem in particle physics, focusing on Froggatt–Nielsen–type models that generate fermion mass hierarchies. By framing model-building as sequential decision processes, RL agents learn to select charge assignments or model ingredients that reproduce quark and lepton observables, often rediscovering known textures and uncovering new viable configurations. The highlighted work spans policy-based methods (REINFORCE) in quark FN textures, value-based learning (DQN) in lepton sectors, and autonomous design with PPO in broader theory spaces (AMBer), demonstrating substantial navigation efficiency in otherwise intractable model spaces. Notably, RL results hint at a statistical preference for Normal Ordering in neutrino masses and illustrate the potential for fully autonomous, data-constrained theory design frameworks, marking a promising direction for future beyond-Standard-Model explorations.

Abstract

This short review discusses recent applications of Reinforcement Learning (RL) techniques to the flavor problem in particle physics. Traditional approaches to fermion masses and mixing often rely on extensions of the Standard Model based on horizontal symmetries, but the vast landscape of possible models makes systematic exploration infeasible. Recent works have shown that RL can efficiently navigate this landscape by constructing models that reproduce observed quark and lepton observables. These approaches demonstrate that RL not only rediscovers models already proposed in the literature but also uncovers new, phenomenologically acceptable solutions.

Reinforcement Learning techniques for the flavor problem in particle physics

TL;DR

This review surveys how Reinforcement Learning is applied to the flavor problem in particle physics, focusing on Froggatt–Nielsen–type models that generate fermion mass hierarchies. By framing model-building as sequential decision processes, RL agents learn to select charge assignments or model ingredients that reproduce quark and lepton observables, often rediscovering known textures and uncovering new viable configurations. The highlighted work spans policy-based methods (REINFORCE) in quark FN textures, value-based learning (DQN) in lepton sectors, and autonomous design with PPO in broader theory spaces (AMBer), demonstrating substantial navigation efficiency in otherwise intractable model spaces. Notably, RL results hint at a statistical preference for Normal Ordering in neutrino masses and illustrate the potential for fully autonomous, data-constrained theory design frameworks, marking a promising direction for future beyond-Standard-Model explorations.

Abstract

This short review discusses recent applications of Reinforcement Learning (RL) techniques to the flavor problem in particle physics. Traditional approaches to fermion masses and mixing often rely on extensions of the Standard Model based on horizontal symmetries, but the vast landscape of possible models makes systematic exploration infeasible. Recent works have shown that RL can efficiently navigate this landscape by constructing models that reproduce observed quark and lepton observables. These approaches demonstrate that RL not only rediscovers models already proposed in the literature but also uncovers new, phenomenologically acceptable solutions.

Paper Structure

This paper contains 8 sections, 23 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Learning results for the lepton sector by RL having specified the neutrino mass ordering. Upper panel: loss function vs episode number. Lower panels: effective neutrino mass $m_{\beta\beta}$ (left plot) and sum of the neutrino masses (right plot) versus the atmospheric mixing angle $\theta_{23}$, for the 6 viable models found by the authors. See text for further details. Adapted from Nishimura:2020nre.
  • Figure 2: Training variables of interest over time for searches in three spaces: $A_4 \times \mathbb{Z}_4$ (top), $A_4 \times \mathbb{Z}_N$ (middle), and $T_{19} \times \mathbb{Z}_4$ (bottom). Left column: evolution of $\chi^2$ in blue (where the curve indicates the median $\log_{10}{\chi^2}$ over all environments) and the mean number of parameters $\langle n_p \rangle$ as training progresses in orange. Right column: number of valid models in orange and good ($\chi^2 \leq 10$ and $n_p \leq$ 7) models in blue. From Baretz:2025zsv.