Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings
Sarang Patil, Ashish Parmanand Pandey, Ioannis Koutis, Mengjia Xu
TL;DR
Problem: Large language models rely on flat Euclidean embeddings, limiting hierarchical reasoning. Approach: HiM integrates Mamba2 state-space modeling with hyperbolic geometry, learning curvature and projecting representations into the Poincaré ball or Lorentz manifold with hyperbolic losses to preserve hierarchy. Contributions: direct hyperbolic integration, SentenceMamba-16M, stabilized Lorentz operations, learnable curvature, hyperbolic centripetal and clustering losses, and comprehensive evaluation on four ontologies showing superior hierarchical reasoning. Impact: enables scalable, hierarchy-aware long-range language understanding with robust, efficient hyperbolic embeddings that outperform Euclidean baselines and offer insights into manifold choices for different data structures.
Abstract
Selective state-space models excel at long-sequence modeling, but their capacity for language representation -- in complex hierarchical reasoning -- remains underexplored. Most large language models rely on \textit{flat} Euclidean embeddings, limiting their ability to capture latent hierarchies. To address this, we propose {\it Hierarchical Mamba (HiM)}, integrating efficient Mamba2 with hyperbolic geometry to learn hierarchy-aware language embeddings for deeper linguistic understanding. Mamba2-processed sequences are projected to the Poincaré ball or Lorentzian manifold with ``learnable'' curvature, optimized with a hyperbolic loss. Our HiM model facilitates the capture of relational distances across varying hierarchical levels, enabling effective long-range reasoning for tasks like mixed-hop prediction and multi-hop inference in hierarchical classification. Experimental results show both HiM variants effectively capture hierarchical relationships across four linguistic and medical datasets, surpassing Euclidean baselines, with HiM-Poincaré providing fine-grained distinctions with higher h-norms, while HiM-Lorentz offers more stable, compact, and hierarchy-preserving embeddings-favoring robustness. The source code is publicly available at https://github.com/BerryByte/HiM.
