Table of Contents
Fetching ...

Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings

Sarang Patil, Ashish Parmanand Pandey, Ioannis Koutis, Mengjia Xu

TL;DR

Problem: Large language models rely on flat Euclidean embeddings, limiting hierarchical reasoning. Approach: HiM integrates Mamba2 state-space modeling with hyperbolic geometry, learning curvature and projecting representations into the Poincaré ball or Lorentz manifold with hyperbolic losses to preserve hierarchy. Contributions: direct hyperbolic integration, SentenceMamba-16M, stabilized Lorentz operations, learnable curvature, hyperbolic centripetal and clustering losses, and comprehensive evaluation on four ontologies showing superior hierarchical reasoning. Impact: enables scalable, hierarchy-aware long-range language understanding with robust, efficient hyperbolic embeddings that outperform Euclidean baselines and offer insights into manifold choices for different data structures.

Abstract

Selective state-space models excel at long-sequence modeling, but their capacity for language representation -- in complex hierarchical reasoning -- remains underexplored. Most large language models rely on \textit{flat} Euclidean embeddings, limiting their ability to capture latent hierarchies. To address this, we propose {\it Hierarchical Mamba (HiM)}, integrating efficient Mamba2 with hyperbolic geometry to learn hierarchy-aware language embeddings for deeper linguistic understanding. Mamba2-processed sequences are projected to the Poincaré ball or Lorentzian manifold with ``learnable'' curvature, optimized with a hyperbolic loss. Our HiM model facilitates the capture of relational distances across varying hierarchical levels, enabling effective long-range reasoning for tasks like mixed-hop prediction and multi-hop inference in hierarchical classification. Experimental results show both HiM variants effectively capture hierarchical relationships across four linguistic and medical datasets, surpassing Euclidean baselines, with HiM-Poincaré providing fine-grained distinctions with higher h-norms, while HiM-Lorentz offers more stable, compact, and hierarchy-preserving embeddings-favoring robustness. The source code is publicly available at https://github.com/BerryByte/HiM.

Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings

TL;DR

Problem: Large language models rely on flat Euclidean embeddings, limiting hierarchical reasoning. Approach: HiM integrates Mamba2 state-space modeling with hyperbolic geometry, learning curvature and projecting representations into the Poincaré ball or Lorentz manifold with hyperbolic losses to preserve hierarchy. Contributions: direct hyperbolic integration, SentenceMamba-16M, stabilized Lorentz operations, learnable curvature, hyperbolic centripetal and clustering losses, and comprehensive evaluation on four ontologies showing superior hierarchical reasoning. Impact: enables scalable, hierarchy-aware long-range language understanding with robust, efficient hyperbolic embeddings that outperform Euclidean baselines and offer insights into manifold choices for different data structures.

Abstract

Selective state-space models excel at long-sequence modeling, but their capacity for language representation -- in complex hierarchical reasoning -- remains underexplored. Most large language models rely on \textit{flat} Euclidean embeddings, limiting their ability to capture latent hierarchies. To address this, we propose {\it Hierarchical Mamba (HiM)}, integrating efficient Mamba2 with hyperbolic geometry to learn hierarchy-aware language embeddings for deeper linguistic understanding. Mamba2-processed sequences are projected to the Poincaré ball or Lorentzian manifold with ``learnable'' curvature, optimized with a hyperbolic loss. Our HiM model facilitates the capture of relational distances across varying hierarchical levels, enabling effective long-range reasoning for tasks like mixed-hop prediction and multi-hop inference in hierarchical classification. Experimental results show both HiM variants effectively capture hierarchical relationships across four linguistic and medical datasets, surpassing Euclidean baselines, with HiM-Poincaré providing fine-grained distinctions with higher h-norms, while HiM-Lorentz offers more stable, compact, and hierarchy-preserving embeddings-favoring robustness. The source code is publicly available at https://github.com/BerryByte/HiM.

Paper Structure

This paper contains 29 sections, 24 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Overview of the Hierarchical Mamba (HiM) model, integrating Mamba2 blocks with hyperbolic projections to the Poincaré ball (via tangent-based mapping) and Lorentzian manifold (via cosine/sine-based mapping), enabling efficient and hierarchy-aware language embeddings for long-range reasoning tasks.
  • Figure 2: Visualization of HiM’s embeddings trained on the WordNet dataset in the Poincaré ball manifold. Left: The full hyperbolic space, illustrating the distribution of entities with parent nodes positioned closer to the origin and child nodes extending toward the boundary, reflecting the exponential expansion of hyperbolic geometry. Right: A zoomed-in view emphasizing the hierarchical structure, such as the path sport $\to$ skating $\to$ skateboarding. Dots represent the entities, with colors indicating hierarchical relationships. For a selected node "skating" denoted by the green dot, the blue node denotes its parent nodes (e.g., sport), and red indicates its hard negatives, such as siblings/cousins (e.g., rowing). Yellow nodes (e.g, skateboarding, speed skating) indicate children nodes of the selected node (skating), meaning the grandchildren nodes of the blue node (sport).
  • Figure 3: Illustration of word embeddings in Euclidean (Left) vs. Hyperbolic Spaces for hierarchical representation in Poincaré (Top right) and Lorentzian Manifolds (Bottom right).
  • Figure 4: Calculation of hyperbolic loss from clustering loss and centripetal loss.
  • Figure 5: Alignment between the computed h-norms (derived from hyperbolic embeddings by HiM-Poinaré and HiM-Lorentz) and the actual tree-depth for sports-related entities in the WordNet dataset. As the depth increases from general terms like "sport" to specific ones like "skateboarding" and "speed skating", both HiM models show increasing h-norms, reflecting the underlying hierarchical structure. While HiM-Poincaré produces higher h-norms that better differentiate fine-grained semantic levels, while HiM-Lorentz yields more compact yet hierarchy-preserving embeddings with improved numerical stability. Our results illustrate that both HiM models effectively encode semantic hierarchy, with Poincaré favoring detail and Lorentz emphasizing robustness.
  • ...and 5 more figures