Table of Contents
Fetching ...

HyperCore: The Core Framework for Building Hyperbolic Foundation Models with Comprehensive Modules

Neil He, Menglin Yang, Rex Ying

TL;DR

This work tackles the lack of a unified, open-source framework for building hyperbolic foundation models. It introduces HyperCore, a PyTorch-based collection of comprehensive modules that support both Lorentz and Poincaré hyperbolic spaces and facilitate end-to-end construction. The authors demonstrate the framework by building fully hyperbolic architectures—LViT, L-CLIP, and HypGraphRAG—and conducting extensive experiments across vision, language, and graph modalities. Results indicate that hyperbolic variants can outperform their Euclidean counterparts in several tasks, while HyperCore substantially lowers development effort and encourages broader exploration of curvature and geometry in foundation models.

Abstract

Hyperbolic neural networks have emerged as a powerful tool for modeling hierarchical data across diverse modalities. Recent studies show that token distributions in foundation models exhibit scale-free properties, suggesting that hyperbolic space is a more suitable ambient space than Euclidean space for many pre-training and downstream tasks. However, existing tools lack essential components for building hyperbolic foundation models, making it difficult to leverage recent advancements. We introduce HyperCore, a comprehensive open-source framework that provides core modules for constructing hyperbolic foundation models across multiple modalities. HyperCore's modules can be effortlessly combined to develop novel hyperbolic foundation models, eliminating the need to extensively modify Euclidean modules from scratch and possible redundant research efforts. To demonstrate its versatility, we build and test the first fully hyperbolic vision transformers (LViT) with a fine-tuning pipeline, the first fully hyperbolic multimodal CLIP model (L-CLIP), and a hybrid Graph RAG with a hyperbolic graph encoder. Our experiments demonstrate that LViT outperforms its Euclidean counterpart. Additionally, we benchmark and reproduce experiments across hyperbolic GNNs, CNNs, Transformers, and vision Transformers to highlight HyperCore's advantages.

HyperCore: The Core Framework for Building Hyperbolic Foundation Models with Comprehensive Modules

TL;DR

This work tackles the lack of a unified, open-source framework for building hyperbolic foundation models. It introduces HyperCore, a PyTorch-based collection of comprehensive modules that support both Lorentz and Poincaré hyperbolic spaces and facilitate end-to-end construction. The authors demonstrate the framework by building fully hyperbolic architectures—LViT, L-CLIP, and HypGraphRAG—and conducting extensive experiments across vision, language, and graph modalities. Results indicate that hyperbolic variants can outperform their Euclidean counterparts in several tasks, while HyperCore substantially lowers development effort and encourages broader exploration of curvature and geometry in foundation models.

Abstract

Hyperbolic neural networks have emerged as a powerful tool for modeling hierarchical data across diverse modalities. Recent studies show that token distributions in foundation models exhibit scale-free properties, suggesting that hyperbolic space is a more suitable ambient space than Euclidean space for many pre-training and downstream tasks. However, existing tools lack essential components for building hyperbolic foundation models, making it difficult to leverage recent advancements. We introduce HyperCore, a comprehensive open-source framework that provides core modules for constructing hyperbolic foundation models across multiple modalities. HyperCore's modules can be effortlessly combined to develop novel hyperbolic foundation models, eliminating the need to extensively modify Euclidean modules from scratch and possible redundant research efforts. To demonstrate its versatility, we build and test the first fully hyperbolic vision transformers (LViT) with a fine-tuning pipeline, the first fully hyperbolic multimodal CLIP model (L-CLIP), and a hybrid Graph RAG with a hyperbolic graph encoder. Our experiments demonstrate that LViT outperforms its Euclidean counterpart. Additionally, we benchmark and reproduce experiments across hyperbolic GNNs, CNNs, Transformers, and vision Transformers to highlight HyperCore's advantages.

Paper Structure

This paper contains 23 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Snapshot of high level categories of hyperbolic foundation models supported by HyperCore and essential modules in HyperCore used by these models, along with example downstream tasks and the data modalities that interact with these models. Boxes drawn with dashed lines encloses a downstream task with the its associated model. For graph tasks in 5.4, LP, NC, and MD stand for link prediction, node classification, and minimizing distortion in graph reconstruction tasks respectively. The relevant subsections in \ref{['eval']} for each model-task combination are also indicated in white.
  • Figure 2: Example of building a fully hyperbolic Transformer encoder/decoder block.
  • Figure 3: Framework of hyperbolic vision Transformer in Lorentz space (LViT). Images are projected into the Lorentz space and then process though hyperbolic patch embedding via a Lorentz convolutional layer. The patch embeddings are then combined with learned hyperbolic positional embeddings, where the results are then passed through a hyperbolic ViT encoder consisting of hyperbolic self-attention mechanisms and hyperbolic MLPs. Finally, the encoder outputs are averaged across patches and passed to a classifier. Dropout and activation are omitted for brevity.
  • Figure 4: Framework of Lorentzian CLIP model (L-CLIP) for multi-modality learning (right), consisting of a hyperbolic image and text encoder. We build the first hyperbolic CLIP model using HyperCore with LViT as the image encoder and a hyperbolic language Transformer as the text encoder. In comparison, MERU (left) is a hybrid model, utilizing Euclidean encoders and computes the loss in hyperbolic space.
  • Figure 5: Framework of hyperbolic GraphRAG model (HypGraphRAG). Compare to standard Euclidean GraphRAG, a hyperbolic graph encoder is employed to encode the retrieved subgraph and (optional) hyperbolic LoRA for fine-tuning.