Table of Contents
Fetching ...

Machine Learning Class Numbers of Real Quadratic Fields

Malik Amir, Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, Eldar Sultanow

TL;DR

This work investigates the problem of distinguishing real quadratic fields by class number using finitely many Dedekind zeta-coefficient features. It formalizes genus-theoretic constraints, introduces counting-function-based cost measures, and a bubble algorithm to quantify separability between class numbers, while employing PCA for dimensionality insights. For the {1,2} case, gradient-boosting models and symbolic classification recover genus-theory parities and yield explicit approximate class-number formulas. For the {1,3} case, zeta-coefficients alone are insufficient, but combining discriminants, regulators, and partial zeta-sums with symbolic methods yields high-accuracy predictors and explicit formulas, highlighting the differing data-and-feature requirements across class-number pairs and linking ML results to classical number-theoretic invariants.

Abstract

We implement and interpret various supervised learning experiments involving real quadratic fields with class numbers 1, 2 and 3. We quantify the relative difficulties in separating class numbers of matching/different parity from a data-scientific perspective, apply the methodology of feature analysis and principal component analysis, and use symbolic classification to develop machine-learned formulas for class numbers 1, 2 and 3 that apply to our dataset.

Machine Learning Class Numbers of Real Quadratic Fields

TL;DR

This work investigates the problem of distinguishing real quadratic fields by class number using finitely many Dedekind zeta-coefficient features. It formalizes genus-theoretic constraints, introduces counting-function-based cost measures, and a bubble algorithm to quantify separability between class numbers, while employing PCA for dimensionality insights. For the {1,2} case, gradient-boosting models and symbolic classification recover genus-theory parities and yield explicit approximate class-number formulas. For the {1,3} case, zeta-coefficients alone are insufficient, but combining discriminants, regulators, and partial zeta-sums with symbolic methods yields high-accuracy predictors and explicit formulas, highlighting the differing data-and-feature requirements across class-number pairs and linking ML results to classical number-theoretic invariants.

Abstract

We implement and interpret various supervised learning experiments involving real quadratic fields with class numbers 1, 2 and 3. We quantify the relative difficulties in separating class numbers of matching/different parity from a data-scientific perspective, apply the methodology of feature analysis and principal component analysis, and use symbolic classification to develop machine-learned formulas for class numbers 1, 2 and 3 that apply to our dataset.
Paper Structure (19 sections, 7 theorems, 36 equations, 14 figures, 13 tables)

This paper contains 19 sections, 7 theorems, 36 equations, 14 figures, 13 tables.

Key Result

Proposition 2.1

We have where $C^+_d$ denotes the narrow class group of $K_d$ and $C_d$ the class group of $K_d$, and $s=t-1$ if $d$ is a sum of two squares and $s=t-2$ otherwise.

Figures (14)

  • Figure 1: Value distribution for the triples $(a_3,a_5,a_7)$ (left) and $(a_2,a_3,a_5)$ (right) where red (resp. green) bubbles correspond to class number 1 (resp. 2) real quadratic fields.
  • Figure 2: Correlation matrix of the first ten coefficients of the Dedekind zeta functions $\zeta_d(s)$ of real quadratic fields with $h_d\in\{1,2\}$.
  • Figure 3: Comparison of the LightGBM and CatBoost learning models against other AutoMLjar models on a 70/30 split for the binary classification task $h_d=1$ vs $h_d=2$.
  • Figure 4: KS statistic plot over the training set of the LightGBM (right) and CatBoost model (left) for the binary classification task $h_d=1$ vs $h_d=2$ with split $70 / 30$.
  • Figure 5: Permutation feature importance for the LightGBM (left) and Catboost (right) model for the binary classification task $h_d=1$ vs $h_d=2$ on a $70/30$ split.
  • ...and 9 more figures

Theorems & Definitions (25)

  • Proposition 2.1
  • Corollary 2.2
  • proof
  • Lemma 2.3
  • proof
  • Lemma 2.4
  • proof
  • Lemma 2.5
  • proof
  • Remark 2.6
  • ...and 15 more