Table of Contents
Fetching ...

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu, Chenghao Sun, Xuming Hu, Conghui He, Linfeng Zhang

TL;DR

This work reframes dataset distillation as a minmax optimization problem and introduces Neural Characteristic Function Matching (NCFM), powered by Neural Characteristic Function Discrepancy (NCFD) to capture full distributional differences via the characteristic function. A learnable sampling network optimizes the frequency arguments of the CF, while synthetic data are trained to minimize the resulting discrepancy, balancing phase and amplitude in the complex plane for realism and diversity. The approach achieves state-of-the-art performance across CIFAR-10/100, Tiny ImageNet, and ImageNet subsets, with substantial gains in accuracy and dramatic reductions in GPU memory usage and distillation time, including lossless CIFAR distillation on a single GPU. The method offers scalable, principled distribution matching that improves robustness and efficiency for large-scale dataset distillation applications.

Abstract

Dataset distillation has emerged as a powerful approach for reducing data requirements in deep learning. Among various methods, distribution matching-based approaches stand out for their balance of computational efficiency and strong performance. However, existing distance metrics used in distribution matching often fail to accurately capture distributional differences, leading to unreliable measures of discrepancy. In this paper, we reformulate dataset distillation as a minmax optimization problem and introduce Neural Characteristic Function Discrepancy (NCFD), a comprehensive and theoretically grounded metric for measuring distributional differences. NCFD leverages the Characteristic Function (CF) to encapsulate full distributional information, employing a neural network to optimize the sampling strategy for the CF's frequency arguments, thereby maximizing the discrepancy to enhance distance estimation. Simultaneously, we minimize the difference between real and synthetic data under this optimized NCFD measure. Our approach, termed Neural Characteristic Function Matching (\mymethod{}), inherently aligns the phase and amplitude of neural features in the complex plane for both real and synthetic data, achieving a balance between realism and diversity in synthetic samples. Experiments demonstrate that our method achieves significant performance gains over state-of-the-art methods on both low- and high-resolution datasets. Notably, we achieve a 20.5\% accuracy boost on ImageSquawk. Our method also reduces GPU memory usage by over 300$\times$ and achieves 20$\times$ faster processing speeds compared to state-of-the-art methods. To the best of our knowledge, this is the first work to achieve lossless compression of CIFAR-100 on a single NVIDIA 2080 Ti GPU using only 2.3 GB of memory.

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

TL;DR

This work reframes dataset distillation as a minmax optimization problem and introduces Neural Characteristic Function Matching (NCFM), powered by Neural Characteristic Function Discrepancy (NCFD) to capture full distributional differences via the characteristic function. A learnable sampling network optimizes the frequency arguments of the CF, while synthetic data are trained to minimize the resulting discrepancy, balancing phase and amplitude in the complex plane for realism and diversity. The approach achieves state-of-the-art performance across CIFAR-10/100, Tiny ImageNet, and ImageNet subsets, with substantial gains in accuracy and dramatic reductions in GPU memory usage and distillation time, including lossless CIFAR distillation on a single GPU. The method offers scalable, principled distribution matching that improves robustness and efficiency for large-scale dataset distillation applications.

Abstract

Dataset distillation has emerged as a powerful approach for reducing data requirements in deep learning. Among various methods, distribution matching-based approaches stand out for their balance of computational efficiency and strong performance. However, existing distance metrics used in distribution matching often fail to accurately capture distributional differences, leading to unreliable measures of discrepancy. In this paper, we reformulate dataset distillation as a minmax optimization problem and introduce Neural Characteristic Function Discrepancy (NCFD), a comprehensive and theoretically grounded metric for measuring distributional differences. NCFD leverages the Characteristic Function (CF) to encapsulate full distributional information, employing a neural network to optimize the sampling strategy for the CF's frequency arguments, thereby maximizing the discrepancy to enhance distance estimation. Simultaneously, we minimize the difference between real and synthetic data under this optimized NCFD measure. Our approach, termed Neural Characteristic Function Matching (\mymethod{}), inherently aligns the phase and amplitude of neural features in the complex plane for both real and synthetic data, achieving a balance between realism and diversity in synthetic samples. Experiments demonstrate that our method achieves significant performance gains over state-of-the-art methods on both low- and high-resolution datasets. Notably, we achieve a 20.5\% accuracy boost on ImageSquawk. Our method also reduces GPU memory usage by over 300 and achieves 20 faster processing speeds compared to state-of-the-art methods. To the best of our knowledge, this is the first work to achieve lossless compression of CIFAR-100 on a single NVIDIA 2080 Ti GPU using only 2.3 GB of memory.

Paper Structure

This paper contains 20 sections, 3 theorems, 8 equations, 7 figures, 5 tables.

Key Result

Theorem 1

Let $\{X_n\}_{n=1}^{\infty}$ be a sequence of random variables with characteristic functions $\Phi_n(\bm{t}) = \mathbb{E}_{X_n}\left[e^{j \langle \bm{t}, X_n \rangle }\right]$. Suppose $\Phi_n(\bm{t}) \to \Phi(\bm{t})$ pointwise for each $\bm{t} \in \mathbb{R}^d$ as $n \to \infty$. If $\Phi(\bm{t})$

Figures (7)

  • Figure 1: Comparison of different paradigms for dataset distillation. (a) The MSE approach compares point-wise features within Euclidean space, denoted as $\mathcal{Z}_{\mathbb{R}}$, while MMD evaluates moment differences in Hilbert space, $\mathcal{Z}_{\mathcal{H}}$. (b) Our method redefines distribution matching as a minmax optimization problem, where the distributional discrepancy is parameterized by a neural network $\psi$. We begin by optimizing $\psi$ to maximize the discrepancy, thereby establishing the latent space $\mathcal{Z}_{\psi}$, and subsequently optimize the synthesized data $\tilde{\mathcal{D}}$ to minimize this discrepancy within $\mathcal{Z}_{\psi}$.
  • Figure 2: Comparison of different distribution matching methods. (a) Illustration of embedded features from the real domain to complex-plane features using Euler's formula euler8transcending. The latent neural feature $\Phi_{\bm{x}}(\bm{t})$ captures the amplitude and phase information. (b) MMD-based methods align feature moments in the embedded domain but may not effectively align the overall distributions. (c) CF-based methods directly compare distributions by balancing the amplitude and phase in the complex plane, enhancing distributional similarity.
  • Figure 3: Comparison of performance, peak GPU memory usage, and distillation speed between the state-of-the-art (SOTA) distillation method and our NCFM on CIFAR-100 across various IPC values, evaluated on 8 NVIDIA H100 GPUs. Notably, NCFM reduces GPU memory usage by over 300$\times$, achieves 20$\times$ faster distillation, and delivers better performance. We also successfully demonstrated lossless distillation using only 2.3GB GPU memory.
  • Figure 4: Dataset Distillation with Neural Characteristic Function Matching (NCFM). Real and synthetic data points are sampled and embedded through a feature extractor network. The synthetic data is optimized by minimizing the distributional discrepancy between real and synthetic data, measured via the Neural Characteristic Function Discrepancy (NCFD) in the complex plane. Additionally, an auxiliary network learns an optimal sampling distribution for the frequency arguments of the characteristic function. Best viewed in color.
  • Figure 5: Impact of amplitude and phase components in the NCFD measure across various datasets and IPC settings. The figure illustrates the relationship between the amplitude-to-phase ratio $\alpha$ in Eq. (\ref{['eq:alpha_blend']}). Results indicate that balancing amplitude (for diversity) and phase (for realism) information leads to improved performance. Baseline results were obtained using DM DM.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1: Lévy's Convergence Theorem levy1937
  • Theorem 2: Uniqueness for Characteristic Functions feuerverger1977empirical
  • Theorem 3: CFD as a Distance Metric.