Hyperbolic Dataset Distillation
Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
TL;DR
Hyperbolic Dataset Distillation (HDD) introduces a novel framework that leverages hyperbolic geometry to capture hierarchical structure in large datasets for distillation. By mapping features into the Lorentz hyperbolic space and matching the centroids of real and synthetic data via the Lorentzian distance, HDD emphasizes prototype-level (lower-level) samples while preserving global geometry. The approach yields stable training, enables pruning down to 20% of the data without sacrificing performance, and shows improvements across multiple benchmarks and cross-architecture generalization. The work opens avenues for incorporating non-Euclidean metrics like KL or Wasserstein into hyperbolic dataset distillation and highlights practical gains in efficiency and scalability.
Abstract
To address the computational and storage challenges posed by large-scale datasets in deep learning, dataset distillation has been proposed to synthesize a compact dataset that replaces the original while maintaining comparable model performance. Unlike optimization-based approaches that require costly bi-level optimization, distribution matching (DM) methods improve efficiency by aligning the distributions of synthetic and original data, thereby eliminating nested optimization. DM achieves high computational efficiency and has emerged as a promising solution. However, existing DM methods, constrained to Euclidean space, treat data as independent and identically distributed points, overlooking complex geometric and hierarchical relationships. To overcome this limitation, we propose a novel hyperbolic dataset distillation method, termed HDD. Hyperbolic space, characterized by negative curvature and exponential volume growth with distance, naturally models hierarchical and tree-like structures. HDD embeds features extracted by a shallow network into the Lorentz hyperbolic space, where the discrepancy between synthetic and original data is measured by the hyperbolic (geodesic) distance between their centroids. By optimizing this distance, the hierarchical structure is explicitly integrated into the distillation process, guiding synthetic samples to gravitate towards the root-centric regions of the original data distribution while preserving their underlying geometric characteristics. Furthermore, we find that pruning in hyperbolic space requires only 20% of the distilled core set to retain model performance, while significantly improving training stability. To the best of our knowledge, this is the first work to incorporate the hyperbolic space into the dataset distillation process. The code is available at https://github.com/Guang000/HDD.
