Mathematical Data Science
Michael R. Douglas, Kyu-Hwan Lee
TL;DR
This paper introduces the mathematical data science (MDS) paradigm, proposing that ML can help study mathematical objects collectively by constructing precise datasets and interpreting results to conjecture and prove new structures. It presents two case studies: murmuration phenomena in elliptic curves via $a_p$-invariants and PCA-driven rank separation, and the analysis of Kronecker coefficients through loading-based embeddings derived from similitude and difference matrices. The elliptic-curve study achieves near-perfect rank classification and reveals scale-invariant oscillations, while the Kronecker study achieves high nonzero-vs-zero classification accuracy and uncovers structural loadings ($a$-loadings and $b$-loadings) with informative distributions and conditional behavior. Together, these examples illustrate how MDS can guide mathematical discovery with human insight, model simplicity, and rigorous interpretation, potentially leading to new conjectures and proofs.
Abstract
Can machine learning help discover new mathematical structures? In this article we discuss an approach to doing this which one can call "mathematical data science". In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations. After an overview, we present two case studies: murmurations in number theory and loadings of partitions related to Kronecker coefficients in representation theory and combinatorics.
