Table of Contents
Fetching ...

Linear Regression in p-adic metric spaces

Gregory D. Baker, Scott McCallum, Dirk Pattinson

TL;DR

Problem: Euclidean losses inadequately reflect hierarchical data. Approach: develop a $p$-adic regression framework and prove a Hyperplane Intersection Theorem showing optimal affine regressors pass through at least $n+1$ points; derive a polynomial corollary and practical algorithmic insights. Key contributions: (i) foundational theory for $p$-adic regression, (ii) a constructive proof enabling brute-force and large-prime optimisations, (iii) polynomial residual insights, and (iv) two NLP-style applications highlighting hierarchy-aware learning. Significance: demonstrates that $p$-adic metrics can align ML methods with hierarchical structure, enabling new algorithms and interpretations for tree-like data and grammar, with potential broader impact on representation learning for hierarchical domains.

Abstract

Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning in p-adic metric spaces, which naturally respect hierarchical structure. Our main result proves that an n-dimensional plane minimizing the p-adic sum of distances to points in a dataset must pass through at least n + 1 of those points -- a striking contrast to Euclidean regression that highlights how p-adic metrics better align with the discrete nature of hierarchical data. As a corollary, a polynomial of degree n constructed to minimise the p-adic sum of residuals will pass through at least n + 1 points. As a further corollary, a polynomial of degree n approximating a higher degree polynomial at a finite number of points will yield a difference polynomial that has distinct rational roots. We demonstrate the practical significance of this result through two applications in natural language processing: analyzing hierarchical taxonomies and modeling grammatical morphology. These results suggest that p-adic metrics may be fundamental to properly handling hierarchical data structures in machine learning. In hierarchical data, interpolation between points often makes less sense than selecting actual observed points as representatives.

Linear Regression in p-adic metric spaces

TL;DR

Problem: Euclidean losses inadequately reflect hierarchical data. Approach: develop a -adic regression framework and prove a Hyperplane Intersection Theorem showing optimal affine regressors pass through at least points; derive a polynomial corollary and practical algorithmic insights. Key contributions: (i) foundational theory for -adic regression, (ii) a constructive proof enabling brute-force and large-prime optimisations, (iii) polynomial residual insights, and (iv) two NLP-style applications highlighting hierarchy-aware learning. Significance: demonstrates that -adic metrics can align ML methods with hierarchical structure, enabling new algorithms and interpretations for tree-like data and grammar, with potential broader impact on representation learning for hierarchical domains.

Abstract

Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning in p-adic metric spaces, which naturally respect hierarchical structure. Our main result proves that an n-dimensional plane minimizing the p-adic sum of distances to points in a dataset must pass through at least n + 1 of those points -- a striking contrast to Euclidean regression that highlights how p-adic metrics better align with the discrete nature of hierarchical data. As a corollary, a polynomial of degree n constructed to minimise the p-adic sum of residuals will pass through at least n + 1 points. As a further corollary, a polynomial of degree n approximating a higher degree polynomial at a finite number of points will yield a difference polynomial that has distinct rational roots. We demonstrate the practical significance of this result through two applications in natural language processing: analyzing hierarchical taxonomies and modeling grammatical morphology. These results suggest that p-adic metrics may be fundamental to properly handling hierarchical data structures in machine learning. In hierarchical data, interpolation between points often makes less sense than selecting actual observed points as representatives.

Paper Structure

This paper contains 21 sections, 7 theorems, 28 equations, 1 figure, 1 table.

Key Result

Theorem 1

Let $n, k \in \mathbb{Z}^+$ where $k \ge n+1$. Let $X_1, X_2, \ldots X_k \in \mathbb{Q}^n$ and $y_1, y_2, \ldots y_k \in \mathbb{Q}$, where $y_i \ne y_j \implies X_i \ne X_j$. Suppose that the data set $X_1, \ldots, X_k$ is non-degenerate, that is, there is no non-zero affine function $\phi : \mathb

Figures (1)

  • Figure 1: A portion of the WordNet hierarchy, with a sample encoding for $p>402$; $p$ must exceed the largest child index in the pruned tree, so we take $p=409$

Theorems & Definitions (16)

  • Theorem 1
  • proof
  • Remark 2
  • Corollary 3
  • proof
  • Theorem 4
  • proof
  • Corollary 5
  • proof
  • Corollary 6
  • ...and 6 more