A Coding-Theoretic Analysis of Hyperspherical Prototypical Learning Geometry
Martin Lindström, Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund
TL;DR
This work addresses the problem of designing class prototypes on the unit hypersphere to maximize separation in supervised representation learning. It introduces coding-theoretic constructions that map binary linear codes to hyperspherical prototypes, providing provable bounds via Gilbert-Varshamov and Rankin results and achieving near-orthogonality in many regimes, especially when the latent dimension satisfies $n \approx K/2$. It augments this with optimization-based prototype schemes using a convex log-sum-exp relaxation to approximate the nonconvex objective, enabling flexible trade-offs between dimension and separation. Empirical evaluation on CIFAR-100 and MNIST demonstrates that more dispersed prototypes tend to yield higher accuracy, though performance also hinges on the semantic alignment between classes and prototype assignments. Overall, the coding-theoretic approach offers scalable, near-optimal prototype designs across a broad range of dimensions, with clear directions for incorporating semantic information and extending to self-supervised learning contexts.
Abstract
Hyperspherical Prototypical Learning (HPL) is a supervised approach to representation learning that designs class prototypes on the unit hypersphere. The prototypes bias the representations to class separation in a scale invariant and known geometry. Previous approaches to HPL have either of the following shortcomings: (i) they follow an unprincipled optimisation procedure; or (ii) they are theoretically sound, but are constrained to only one possible latent dimension. In this paper, we address both shortcomings. To address (i), we present a principled optimisation procedure whose solution we show is optimal. To address (ii), we construct well-separated prototypes in a wide range of dimensions using linear block codes. Additionally, we give a full characterisation of the optimal prototype placement in terms of achievable and converse bounds, showing that our proposed methods are near-optimal.
