Table of Contents
Fetching ...

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

Tatsuya Aoyama, Hanting Yang, Hiroyuki Hanada, Satoshi Akahane, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi

TL;DR

DGKIP recasts dataset distillation as a single-level optimization by leveraging a duality-gap bound, enabling kernel-based distillation beyond squared loss. By bounding the parameter distance between full-data and distilled-data solutions, it derives prediction and test-error guarantees while preserving efficiency. The method extends Kernel Inducing Points to convex losses such as cross-entropy and hinge, demonstrated on MNIST, Fashion-MNIST, and CIFAR-10 with SVM and LR implementations and NNGP-based kernelization. Theoretical bounds and empirical results show robust performance and transferability across models, offering practical gains in data efficiency and computation for classification tasks.

Abstract

We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in convex programming. The KIP method has been introduced as a way to avoid bi-level optimization; however, it is limited to the squared loss and does not support other loss functions (e.g., cross-entropy or hinge loss) that are more suitable for classification tasks. DGKIP addresses this limitation by exploiting an upper bound on parameter changes after dataset distillation using the duality gap, enabling its application to a wider range of loss functions. We also characterize theoretical properties of DGKIP by providing upper bounds on the test error and prediction consistency after dataset distillation. Experimental results on standard benchmarks such as MNIST and CIFAR-10 demonstrate that DGKIP retains the efficiency of KIP while offering broader applicability and robust performance.

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

TL;DR

DGKIP recasts dataset distillation as a single-level optimization by leveraging a duality-gap bound, enabling kernel-based distillation beyond squared loss. By bounding the parameter distance between full-data and distilled-data solutions, it derives prediction and test-error guarantees while preserving efficiency. The method extends Kernel Inducing Points to convex losses such as cross-entropy and hinge, demonstrated on MNIST, Fashion-MNIST, and CIFAR-10 with SVM and LR implementations and NNGP-based kernelization. Theoretical bounds and empirical results show robust performance and transferability across models, offering practical gains in data efficiency and computation for classification tasks.

Abstract

We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in convex programming. The KIP method has been introduced as a way to avoid bi-level optimization; however, it is limited to the squared loss and does not support other loss functions (e.g., cross-entropy or hinge loss) that are more suitable for classification tasks. DGKIP addresses this limitation by exploiting an upper bound on parameter changes after dataset distillation using the duality gap, enabling its application to a wider range of loss functions. We also characterize theoretical properties of DGKIP by providing upper bounds on the test error and prediction consistency after dataset distillation. Experimental results on standard benchmarks such as MNIST and CIFAR-10 demonstrate that DGKIP retains the efficiency of KIP while offering broader applicability and robust performance.

Paper Structure

This paper contains 33 sections, 3 theorems, 49 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3.2

Suppose the objective function $P_{\mathcal{O}}(\boldsymbol{\theta})$ and $P_{\mathcal{S}}(\boldsymbol{\theta})$ are $\lambda$-strongly convex with respect to $\boldsymbol{\theta}$, and let $\bm \theta_{{\mathcal{O}}}^*$ be its unique minimizer. Let $\bm \theta_{{\mathcal{S}}}^*$ be the correspondin where Here, $\tilde{\boldsymbol{\alpha}}_{\mathcal{S}} \in \mathrm{dom}(D_{\mathcal{S}})$ is any f

Figures (2)

  • Figure 1: Paradigm of dataset distillation with (a) Bi-level Optimization, (b) Kernel Inducing Points (KIP), and (c) proposed DGKIP. In each subfigure, $S$ means the synthetic data, $\theta_S$ means the model trained on $S$, and the arrow represents the optimization process. KIP method avoids bi-level optimization by simplified inner loop but restricted to square error loss, while DGKIP expands its paradigm to a large class of loss functions by evaluating the duality gap (DG) instead of $\bm\theta_\mathcal{S}$ itself.
  • Figure 2: Parameter deviation, duality gap, and test accuracy varying cross training process with 1, 10, 50 IPC on CIFAR10. The parameter deviation in the green line (left-hand side of Eq. \ref{['eq:dualitygap']}) and the duality gap in the blue line (right-hand side of Eq. \ref{['eq:dualitygap']}) show the same pattern. Minimizing the duality gap reduces the parameter deviation, leading to an increase in test accuracy.

Theorems & Definitions (6)

  • Definition 3.1: $\lambda$-strong convexity
  • Theorem 3.2: Bound on Parameter Deviation
  • Lemma 3.3: Minimizing DG Also Minimizes the Prediction Upper Bound
  • Lemma 3.4
  • proof
  • proof