Table of Contents
Fetching ...

Hyperbolic Dataset Distillation

Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

TL;DR

Hyperbolic Dataset Distillation (HDD) introduces a novel framework that leverages hyperbolic geometry to capture hierarchical structure in large datasets for distillation. By mapping features into the Lorentz hyperbolic space and matching the centroids of real and synthetic data via the Lorentzian distance, HDD emphasizes prototype-level (lower-level) samples while preserving global geometry. The approach yields stable training, enables pruning down to 20% of the data without sacrificing performance, and shows improvements across multiple benchmarks and cross-architecture generalization. The work opens avenues for incorporating non-Euclidean metrics like KL or Wasserstein into hyperbolic dataset distillation and highlights practical gains in efficiency and scalability.

Abstract

To address the computational and storage challenges posed by large-scale datasets in deep learning, dataset distillation has been proposed to synthesize a compact dataset that replaces the original while maintaining comparable model performance. Unlike optimization-based approaches that require costly bi-level optimization, distribution matching (DM) methods improve efficiency by aligning the distributions of synthetic and original data, thereby eliminating nested optimization. DM achieves high computational efficiency and has emerged as a promising solution. However, existing DM methods, constrained to Euclidean space, treat data as independent and identically distributed points, overlooking complex geometric and hierarchical relationships. To overcome this limitation, we propose a novel hyperbolic dataset distillation method, termed HDD. Hyperbolic space, characterized by negative curvature and exponential volume growth with distance, naturally models hierarchical and tree-like structures. HDD embeds features extracted by a shallow network into the Lorentz hyperbolic space, where the discrepancy between synthetic and original data is measured by the hyperbolic (geodesic) distance between their centroids. By optimizing this distance, the hierarchical structure is explicitly integrated into the distillation process, guiding synthetic samples to gravitate towards the root-centric regions of the original data distribution while preserving their underlying geometric characteristics. Furthermore, we find that pruning in hyperbolic space requires only 20% of the distilled core set to retain model performance, while significantly improving training stability. To the best of our knowledge, this is the first work to incorporate the hyperbolic space into the dataset distillation process. The code is available at https://github.com/Guang000/HDD.

Hyperbolic Dataset Distillation

TL;DR

Hyperbolic Dataset Distillation (HDD) introduces a novel framework that leverages hyperbolic geometry to capture hierarchical structure in large datasets for distillation. By mapping features into the Lorentz hyperbolic space and matching the centroids of real and synthetic data via the Lorentzian distance, HDD emphasizes prototype-level (lower-level) samples while preserving global geometry. The approach yields stable training, enables pruning down to 20% of the data without sacrificing performance, and shows improvements across multiple benchmarks and cross-architecture generalization. The work opens avenues for incorporating non-Euclidean metrics like KL or Wasserstein into hyperbolic dataset distillation and highlights practical gains in efficiency and scalability.

Abstract

To address the computational and storage challenges posed by large-scale datasets in deep learning, dataset distillation has been proposed to synthesize a compact dataset that replaces the original while maintaining comparable model performance. Unlike optimization-based approaches that require costly bi-level optimization, distribution matching (DM) methods improve efficiency by aligning the distributions of synthetic and original data, thereby eliminating nested optimization. DM achieves high computational efficiency and has emerged as a promising solution. However, existing DM methods, constrained to Euclidean space, treat data as independent and identically distributed points, overlooking complex geometric and hierarchical relationships. To overcome this limitation, we propose a novel hyperbolic dataset distillation method, termed HDD. Hyperbolic space, characterized by negative curvature and exponential volume growth with distance, naturally models hierarchical and tree-like structures. HDD embeds features extracted by a shallow network into the Lorentz hyperbolic space, where the discrepancy between synthetic and original data is measured by the hyperbolic (geodesic) distance between their centroids. By optimizing this distance, the hierarchical structure is explicitly integrated into the distillation process, guiding synthetic samples to gravitate towards the root-centric regions of the original data distribution while preserving their underlying geometric characteristics. Furthermore, we find that pruning in hyperbolic space requires only 20% of the distilled core set to retain model performance, while significantly improving training stability. To the best of our knowledge, this is the first work to incorporate the hyperbolic space into the dataset distillation process. The code is available at https://github.com/Guang000/HDD.

Paper Structure

This paper contains 29 sections, 47 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: An example of hierarchical representation in hyperbolic space using the ‘Cat' class from the CIFAR-10 Dataset. Hyperbolic space naturally encodes hierarchical structures. In this context, samples located near the root node often represent the category prototype more effectively, while those situated at higher hierarchical levels (closer to the leaf nodes) tend to contain noisier or specific information, such as object parts or less visible angles.
  • Figure 2: The framework of hyperbolic dataset distillation. The proposed method leverages exponential mapping to embed the dataset into hyperbolic space, enabling a hierarchical representation where samples at different levels are assigned varying weights to reflect their significance within the global geometry. Centroids of both the original and synthetic datasets are then computed in the hyperbolic space, and the geodesic distance between them is used to quantify the distributional discrepancy. This hyperbolic distance serves as a loss term to iteratively update the synthetic dataset, encouraging it to better align with the class-specific prototypes of the original data.
  • Figure 3: Distillation accuracy variations of CIFAR-10 (IPC = 10) during the distillation process with different pruning rates.
  • Figure 4: After distillation with DM with HDD and IDM with HDD, the distributions of the original and synthetic datasets in the Poincaré hyperbolic space are visualized.
  • Figure 5: The distilled images of FashionMNIST with IPC = 50 using DM with HDD.
  • ...and 5 more figures