Peeling metric spaces of strict negative type
Steve Huntsman
TL;DR
The paper addresses outlier detection and maximal-diversity subset selection in finite metric spaces of strict negative type by introducing a parameter-free peeling mechanism that identifies boundary elements via a scalable linear-algebraic procedure. It connects magnitude-based weightings and diversity measures to a convex surrogate given by the quadratic form $p^T d p$ in the $t\downarrow 0$ limit, and provides a fast algorithm with complexity $O(n^{\omega+1})$ to compute the optimal distribution $p_*(d)$. The framework is then extended to product metrics, establishing conditions under which $L^p$-type products preserve strict negative type, and applied to path diversity problems where the space of fixed-length paths is huge but tractable for small subsets; a concrete NY-to-LA, eight-path example with city-embedding features demonstrates practical peel analysis and potential for applications such as hallucination mitigation in text embeddings. Overall, the work offers a principled, scalable method for boundary detection and diversity optimization in complex embedding spaces, with concrete routes to robustness improvements in language models and structured diversity in path- and graph-like domains.
Abstract
We describe a unified and computationally tractable framework for finding outliers in, and maximum-diversity subsets of, finite metric spaces of strict negative type. Examples of such spaces include finite subsets of Euclidean space and finite subsets of a sphere without antipodal points. The latter accounts for state-of-the-art text embeddings, and we apply our framework in this context to sketch a hallucination mitigation strategy and separately to a class of path diversity optimization problems with a real-world example.
