Table of Contents
Fetching ...

Interpretable Analytic Calabi-Yau Metrics via Symbolic Distillation

D Yang Eng

TL;DR

This work demonstrates that a compact analytic model can faithfully reproduce neural surrogates for Calabi–Yau metric determinants governed by the Monge–Ampère PDE. By distilling a five-term formula in terms of gauge-invariant invariants $p_2$ and $\sigma_3$, the authors achieve $R^2\approx0.9994$ with a $3{,}000\times$ reduction in parameters and robust validity across the Dwork family moduli range via volume and Yukawa benchmarks. The functional form remains stable as moduli change, with coefficients $c_i(\psi)$ varying smoothly; singular terms capture essential geometric corrections, revealing a hierarchical modulation that mirrors PDE-constrained structure. The approach significantly accelerates physics calculations by enabling microsecond evaluations, facilitating large-scale moduli scans with practical accuracy limits set by teacher noise. This work also connects with concurrent efforts on symbolic representations of Kahler potentials, highlighting a general principle: fixed PDE structure yields a low-dimensional, interpretable manifold for complex geometric observables.

Abstract

Calabi--Yau manifolds are essential for string theory but require computing intractable metrics. Here we show that symbolic regression can distill neural approximations into simple, interpretable formulas. Our five-term expression matches neural accuracy ($R^2 = 0.9994$) with 3,000-fold fewer parameters. Multi-seed validation confirms that geometric constraints select essential features, specifically power sums and symmetric polynomials, while permitting structural diversity. The functional form can be maintained across the studied moduli range ($ψ\in [0, 0.8]$) with coefficients varying smoothly; we interpret these trends as empirical hypotheses within the accuracy regime of the locally-trained teachers ($σ\approx 8-9\%$ at $ψ\neq 0$). The formula reproduces physical observables -- volume integrals and Yukawa couplings -- validating that symbolic distillation recovers compact, interpretable models for quantities previously accessible only to black-box networks.

Interpretable Analytic Calabi-Yau Metrics via Symbolic Distillation

TL;DR

This work demonstrates that a compact analytic model can faithfully reproduce neural surrogates for Calabi–Yau metric determinants governed by the Monge–Ampère PDE. By distilling a five-term formula in terms of gauge-invariant invariants and , the authors achieve with a reduction in parameters and robust validity across the Dwork family moduli range via volume and Yukawa benchmarks. The functional form remains stable as moduli change, with coefficients varying smoothly; singular terms capture essential geometric corrections, revealing a hierarchical modulation that mirrors PDE-constrained structure. The approach significantly accelerates physics calculations by enabling microsecond evaluations, facilitating large-scale moduli scans with practical accuracy limits set by teacher noise. This work also connects with concurrent efforts on symbolic representations of Kahler potentials, highlighting a general principle: fixed PDE structure yields a low-dimensional, interpretable manifold for complex geometric observables.

Abstract

Calabi--Yau manifolds are essential for string theory but require computing intractable metrics. Here we show that symbolic regression can distill neural approximations into simple, interpretable formulas. Our five-term expression matches neural accuracy () with 3,000-fold fewer parameters. Multi-seed validation confirms that geometric constraints select essential features, specifically power sums and symmetric polynomials, while permitting structural diversity. The functional form can be maintained across the studied moduli range () with coefficients varying smoothly; we interpret these trends as empirical hypotheses within the accuracy regime of the locally-trained teachers ( at ). The formula reproduces physical observables -- volume integrals and Yukawa couplings -- validating that symbolic distillation recovers compact, interpretable models for quantities previously accessible only to black-box networks.
Paper Structure (38 sections, 15 equations, 7 figures, 16 tables)

This paper contains 38 sections, 15 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: Validation of the symbolic formula (Eq. \ref{['eq:main']}) on $10,000$ independent hold-out test points. (Top) Scatter plot shows near-perfect agreement ($R^2=0.9994$) between symbolic prediction and neural surrogate. (Bottom) Residuals are approximately symmetric with $\sigma \approx 0.011$ and zero mean; formal normality is rejected at $\alpha=0.05$ due to minor tail deviations, but the central 95% follows a near-Gaussian distribution (see Appendix for detailed analysis).
  • Figure 2: Coefficient trajectories $c_i(\psi)$ reveal hierarchical moduli response: singular terms ($c_1$, $c_3$) undergo sign reversal (vertical dashed line marks zero crossing), while symmetric term $c_4$ strengthens monotonically. Inset: coefficient classification by response regime.
  • Figure 3: Physics Benchmark: Volume integral $V(\psi) = \int_X \det(g) \cdot \omega^3$ computed along a path in moduli space $\psi \in [0, 0.8]$, well outside the training point ($\psi=0$). The symbolic formula (blue dashed), utilizing hierarchically modulated coefficients, tracks the neural surrogate volume (black solid) to within $\approx 2\%$ relative error.
  • Figure 4: Cross-k validation: Coefficient variation across polynomial degrees $k = 6, 8, 10$ at $\psi = 0$. Despite different teacher accuracies, the five-term structure remains stable with coefficients varying within $\pm 30\%$ bounds.
  • Figure 5: Training curves: Ricci-flatness error $\sigma$ vs. Donaldson iteration for $\psi \in \{0.0, 0.2, 0.4, 0.6, 0.8\}$. All curves converge within 15 iterations.
  • ...and 2 more figures