Hyperbolic Fine-Tuning for Large Language Models
Menglin Yang, Ram Samarth B B, Aosong Feng, Bo Xiong, Jihong Liu, Irwin King, Rex Ying
TL;DR
This work investigates whether Euclidean token spaces are optimal for large language models and uncovers strong hyperbolic, tree-like structures in token embeddings, with high-frequency tokens clustering near the origin and low-frequency terms lying farther out. Building on this insight, the authors introduce HypLoRA, a hyperbolic, parameter-efficient fine-tuning method that performs low-rank adaptation directly on the hyperbolic manifold via a Direct Lorentz Low-Rank Transformation, preserving geometric properties while remaining computationally efficient. The paper provides both global (power-law frequency $γ$) and local ($δ$-hyperbolicity) analyses and establishes a theoretical link between token frequency distributions and hyperbolic curvature. Extensive experiments on arithmetic and commonsense reasoning across multiple base models demonstrate that HypLoRA yields consistent gains over Euclidean LoRA and other adapters, validating the practical value of incorporating hyperbolic inductive biases into PEFT. Overall, the work offers a principled approach to aligning fine-tuning with the intrinsic geometry of language, enabling more effective reasoning with modest additional computational cost ($O(r(d+k))$) and similar memory footprints.
Abstract
Large language models (LLMs) have demonstrated remarkable performance across various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for LLMs. In this study, we investigate the geometric characteristics of LLMs, focusing specifically on tokens and their embeddings. Our findings reveal that token frequency follows a power-law distribution, where high-frequency tokens (e.g., the, that ) constitute the minority, while low-frequency tokens (e.g., apple, dog) constitute the majority. Furthermore, high-frequency tokens cluster near the origin, whereas low-frequency tokens are positioned farther away in the embedding space. Additionally, token embeddings exhibit hyperbolic characteristics, indicating a latent tree-like structure within the embedding space. Motivated by these observations, we propose HypLoRA, an efficient fine-tuning approach that operates in hyperbolic space to exploit these underlying hierarchical structures better. HypLoRA performs low-rank adaptation directly in hyperbolic space, thereby preserving hyperbolic modeling capabilities throughout the fine-tuning process. Extensive experiments across various base models and reasoning benchmarks, specifically arithmetic and commonsense reasoning tasks, demonstrate that HypLoRA substantially improves LLM performance.
