Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

Tatsuya Aoyama; Hanting Yang; Hiroyuki Hanada; Satoshi Akahane; Tomonari Tanaka; Yoshito Okura; Yu Inatsu; Noriaki Hashimoto; Taro Murayama; Hanju Lee; Shinya Kojima; Ichiro Takeuchi

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

Tatsuya Aoyama, Hanting Yang, Hiroyuki Hanada, Satoshi Akahane, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi

TL;DR

DGKIP recasts dataset distillation as a single-level optimization by leveraging a duality-gap bound, enabling kernel-based distillation beyond squared loss. By bounding the parameter distance between full-data and distilled-data solutions, it derives prediction and test-error guarantees while preserving efficiency. The method extends Kernel Inducing Points to convex losses such as cross-entropy and hinge, demonstrated on MNIST, Fashion-MNIST, and CIFAR-10 with SVM and LR implementations and NNGP-based kernelization. Theoretical bounds and empirical results show robust performance and transferability across models, offering practical gains in data efficiency and computation for classification tasks.

Abstract

We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in convex programming. The KIP method has been introduced as a way to avoid bi-level optimization; however, it is limited to the squared loss and does not support other loss functions (e.g., cross-entropy or hinge loss) that are more suitable for classification tasks. DGKIP addresses this limitation by exploiting an upper bound on parameter changes after dataset distillation using the duality gap, enabling its application to a wider range of loss functions. We also characterize theoretical properties of DGKIP by providing upper bounds on the test error and prediction consistency after dataset distillation. Experimental results on standard benchmarks such as MNIST and CIFAR-10 demonstrate that DGKIP retains the efficiency of KIP while offering broader applicability and robust performance.

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

TL;DR

Abstract

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)