Linear Regression in p-adic metric spaces
Gregory D. Baker, Scott McCallum, Dirk Pattinson
TL;DR
Problem: Euclidean losses inadequately reflect hierarchical data. Approach: develop a $p$-adic regression framework and prove a Hyperplane Intersection Theorem showing optimal affine regressors pass through at least $n+1$ points; derive a polynomial corollary and practical algorithmic insights. Key contributions: (i) foundational theory for $p$-adic regression, (ii) a constructive proof enabling brute-force and large-prime optimisations, (iii) polynomial residual insights, and (iv) two NLP-style applications highlighting hierarchy-aware learning. Significance: demonstrates that $p$-adic metrics can align ML methods with hierarchical structure, enabling new algorithms and interpretations for tree-like data and grammar, with potential broader impact on representation learning for hierarchical domains.
Abstract
Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning in p-adic metric spaces, which naturally respect hierarchical structure. Our main result proves that an n-dimensional plane minimizing the p-adic sum of distances to points in a dataset must pass through at least n + 1 of those points -- a striking contrast to Euclidean regression that highlights how p-adic metrics better align with the discrete nature of hierarchical data. As a corollary, a polynomial of degree n constructed to minimise the p-adic sum of residuals will pass through at least n + 1 points. As a further corollary, a polynomial of degree n approximating a higher degree polynomial at a finite number of points will yield a difference polynomial that has distinct rational roots. We demonstrate the practical significance of this result through two applications in natural language processing: analyzing hierarchical taxonomies and modeling grammatical morphology. These results suggest that p-adic metrics may be fundamental to properly handling hierarchical data structures in machine learning. In hierarchical data, interpolation between points often makes less sense than selecting actual observed points as representatives.
